——y, 


HUMAN 


arecord of research 


SEPTEMBER, 1951 


VOL. 23 No. 3 


A SIMPLE STOCHASTIC MODEL OF RECOVERY, 
RELAPSE, DEATH AND LOSS OF PATIENTS 


BY EVELYN FIX AND JERZY NEYMAN? 


1. INTRODUCTION 


HE authors’ attention was drawn to the present problem by an 
sh interesting discussion at the New York statistical meetings in 
December, 1949 [1].2 This interest was later reinforced by acquain- 
tance with the statistical studies of the effects of treatment of cancer 
of the breast by Dr. Robnett and associates [2], and by Dr. Harrington 
[3, 4,5]. These studies are concerned with the differences in effect either 
of the same treatment applied to different categories of patients or of 
different treatments applied to a specified category of patients. In all 
cases the criterion for comparison is the frequency of surviving specified 
periods of time. Dr. Robnett’s study is based on 203 patients who have 
been classified into several distinct categories, the largest of which 


1 This paper was prepared, partly, using the facilities provided by the con- 
tracts with the Office of Naval Research and with the School of Aviation 
Medicine, Randolph Field. 

* Figures in square brackets refer to the list of literature at the end of this 


paper. 


| 
)) 
q 
| 
t 
j 
a 
) 


206 E. FIX AND J. NEYMAN 


contains the initial number of 71 patients while the smallest contains 
only 3. Dr. Harrington’s material is overwhelmingly larger and in his 
last paper [5] he deals with 7138 patients who, since 1910, have passed 
through the operation theatres of the Mayo Clinic. In addition, with 
the acknowledged help of the Division of Biometry and Medical Statistics 
of the Mayo Clinic, Dr. Berkson in charge, the material of Dr. Harring- 
ton includes a very large number of traced patients, always more than 
97 per cent of the total.* Thus, Dr. Harrington’s data permits much 
more detailed and more reliable results than Dr. Robnett’s. The results 
of the studies by the two authors may be exemplified by the following 
figures, with accompanying tables. 


TABLE 1 


Unilateral carcinoma of the breast in women; 
five-year survival rates according to age 


(Harrington, Surgery, Vol. 19 (1946), pp. 154-166) 


WITH METASTASIS WITHOUT METASTASIS 
Lived 5 years or Lived 5 years or 
more after more after 
operation operation 
AGE Patients Patients 
(years) traced Number Percent traced Number Percent 

16-19 1 0 — 4 4 100.0 
20-29 45 13 28.9 43 40 93.0 
30-39 466 106 22.7 281 219 77.9 
40-49 1102 334 30.3 699 571 81.7 
50-59 1 029 341 33.1 549 403 73.4 
60-69 577 189 32.8 356 252 70.8 
70-79 125 34 27.2 122 69 56.6 
80-87 3 0 —_— 5 1 20.0 


Total 3 348 1017 30.4 2 059 1 559 75.7 


* A personal communication from Dr. Berkson states that now 99 per cent of 
the patients have been traced. 


207 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY 


“dd ‘(0¢61) € ‘4a0ung ‘pavzey ) 
HAONVO JO wou 
LISVaUg AHL 40 UAONVO AO AALLVAAMO-LSOg 4O 


WAIAUNS 


ol 


m\ wo 


SAVU-X + 


ol ¢ 0 
= 0 
m 
NN 
~~ 
\ 
\ \ 
\ 
\\ 9 
09 2 
| 


ool 


“dd ‘(9F61) ‘ISA ‘Asabang 
LISVaugq AHL 40 AAONVO 


ONIMOTIOJ AVAX-JAIY ALVY IVAIAUNS ‘OI 


SUV3A NI 39V 


06 09 o¢ 
* 
\ 
1N3S3ud \ 
SISVISVL3N \ 
02 
\ 
\ 
AN3SEV \ 
SISVLSVL3N\, 
\ 
\ 
+09 


LN3D Y3d 


4 
| 
= 
/ 
4 
4 
; i). 
2 | 
| < | 
> 
if 


208 E. FIX AND J. NEYMAN 


TABLE 2 


Five-year results in Portmann Groups I, II and III 


(Robnett, Jones, Hazard, Cancer, Vol. 3 (1950), pp. 757-772) 


OPERATION ONLY 


OPERATION -+ POSTOP. 


X-RAYS 


Total Percent Per cent Total Percent Per cent 


GROUP AND NUMBER no. oftotal of net no. oftotal of net 
Group I (59) 
Total number treated 52 100.0 7 100.0 
Operative deaths 2 3.8 0 0.0 
Lost 4 Ye 1 14.3 
Dead, unrelated causes 3 5.8 1 14.3 
Net followed 5 years 43 82.7 100.0 5 71.4 100.0 
Dead of disease 3 5.8 7.0 1 14.3 20.0 
Alive 40 76.9 93.0 4 57.1 80.0 
Group II (71) 
Total number treated 56 100.0 15 100.0 
Operative deaths 1 1.8 0 0.0 
Lost 7 12.5 2 13.3 
Dead, unrelated causes 5 8.9 0 0.0 
Net followed 5 years 43 76.8 100.0 13 86.7 100.0 
Dead of disease 20 35.7 46.5 2 13.3 15.4 
Alive 23 41.1 53.5 11 73.3 84.6 
Group III (70) 
Total number treated 41 100.0 29 100.0 
Operative deaths 1 2.4 0 0.0 
Lost 5 12.2 2 6.9 
Dead, unrelated causes 4 9.8 2 6.9 
Net followed 5 years 31 75.6 100.0 25 86.2 100.0 
Dead of disease 25 61.0 80.6 22 75.9 88.0 
Alive 6 14.6 19.4 3 10.3 12.0 


| 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY 209 


The two figures refer to different questions, the first to the combined 
effect of age and metastasis and the second to the combined effect of 
category of cancer according to Portmann’s classification and difference 
in treatment. In fact, the studies of Dr. Harrington and Dr. Robnett 
do not include classification of patients based on identical principles. 
The present authors wish it clearly understood that the medical side 
of either inquiry is entirely outside of their domain. 

It is to be emphasized that, despite the difference in volume of 
the material available, Dr. Harrington’s study, as well as that of Dr. 
Robnett, contains comparisons based on small series of observations 
which refer to certain combinations of conditions of the patients. 
Thus, a general conclusion would be that, in present day cancer and 
other medical research, cases arise where a comparison which the medical 
profession considers of sufficient importance to study must necessarily 
be based on small numbers of observations. In these circumstances the 
authors feel justified in undertaking a statistical-theoretica! study with 
the hope that it may lead to the development of tests of some relevant 
statistical hypotheses. 

Statistical tests apply to observable random variables, the distribu- 
tion of which is at least partly unknown. Thus, the first step towards 
the development of statistical tests must consist in building a stochastic 
model of the phenomena studied within which the observable random 
variables could be defined. Any conceivable mathematical model of any 
phenomena must involve simplifications. The greater the simplifications 
adopted, the further one must be from the actual processes studied. On 
the other hand while more detailed models may (but need not) approach 
the phenomena satisfactorily, they may appear so complex as to lose all 
usefulness. The authors are fully aware of these difficulties and only 
hope that the model discussed below will appear sufficiently simple to 
handle and that, at least in some sections of medical research, it will 
prove sufficiently close to the actual phenomena. 


2. STOCHASTIC MODEL OF RECOVERY, RELAPSE AND DEATH 


Considered operationally, the principle underlying the stochastic 
model described below, coincides with the statement of Berkson [6] as 
follows : 


“In essence what is wanted is simple enough. Beginning with a given 
number of patients, what per cent will be alive in 5 years? ‘The group with 


| 

iis 

| q 

nt 

i 
5 

| 


210 E. FIX AND J. NEYMAN 


which we begin may be variously defined; for instance, it may be the group 
of patients who have been diagnosed to have the malignant lesion, or it may 
be the ones who have undergone operation, or it may be only those who have 
survived operation. The rates for these different groups obviously will be 
different, but so long as the basic group is defined unequivocally, the meaning 
of the rate is clear; it gives the probability for the defined group of surviving 
5 or more years.” 


However, the above general principle must be supplemented if we 
wish to have a complete scheme which will specify the various risks that 
threaten the individuals under consideration and will imply the relation- 
ships that exist among the probabilities of these risks. Thus, we postu- 
late a detailed machinery which permits some of the patients in the 
group to die at an early date, some to recover and others to get lost. 
In operational terms the description of this machinery is as follows: * 

Let 7 stand for a period of time, perhaps the period of observation, 
of length such that the change in probabilities of the various risks 
during T can be neglected. Divide 7 into a large number M of “ ele- 
mentary ” time intervals +r so that Mr—T. The elementary time 
intervals + are postulated to be so short that a given patient exposed 
to various risks cannot succumb to more than one during a single time 
interval r. Thus, for example, it is postulated that a patient under 
treatment for cancer at the beginning of a time interval + cannot during 
this interval both recover and die from causes not connected with cancer. 
With each patient under consideration we associate a number of possible 
“states” in which he may find himself at any given moment. Speci- 
fically, we contemplate the following states: 


So = initial state with some specific definition as visualized by 
Berkson, for example, the state of being under treatment 
for cancer. 


S, =state of being dead immediately following treatment ‘or 
cancer (death from cancer or operative death). 


S, = state of being alive, not under treatment for cancer and 
remaining under observation. For brevity this state will 
be described as that of having “ recovered” from cancer, 
although we appreciate that this “recovery” may be only 
an apparent one and that the physician may be well aware 


*See also Neyman [7], pp. 69-95. 


| 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY 211 


that the patient still suffers from cancer. Another con- 
venient term to describe S, is “leading normal life.” 

S, — state of being lost after “recovery,” either through death 
not connected with cancer or through difficulties of tracing 
the patient. 


Fig. 3 illustrates the scheme of the four states contemplated and 
the possibilities of passage from one state to another. It will be seen 
that we postulate only two ways in which the patient can leave the 


CONSECUTIVE TIME ELEMENTS T, Ty 
“ States 
+t ' 
diagnosed cancer So So So 0 
“Apparent recovery ” 
or “ normal life ” S2 S2 S2 Sa 
from other causes $3 83 
3 
" " 
Untraced S3 S83 


Fig. 3. System or “ STATES ” POSTULATED 


initial state S,: by dying (transfer to state S,) or by recovering (trans- 
fer to state S,). Also we postulate that the patient can leave state S, 
in only one of two ways, by having a recurrence of cancer (transfer to 
state S,) or by being lost from observation or by dying from causes not 
connected with cancer (transfer to state S;). 

We realize, of course, that the postulate of only four states as 
described above and also the postulate of limited passage from one to 
another simplifies matters considerably. In principle, it is easy to 
increase the number of different states. However, this will introduce 
more parameters to be estimated and may make all the calculations 
involved substantially more complicated. 


up 
ay 
ve 
be i 
ng | 
ng 
ve 
at 
n- al 
t. | 
l, 
: 
| 
4 
) 
| 


212 E. FIX AND J. NEYMAN 


One approach to reality is relatively easy to make. This consists 
in splitting state S, into two substates, S,’ and S,”, of dying from 
cancer proper and of dying from an operation for cancer, respectively. 
Similarly, without complicating the model appreciably, state S; may be 
split into S,’, the state of being lost from observation, and S;”, the 
state of dying from causes not connected with cancer. However, for 
the sake of simplicity, we shall confine our consideration to the four 
main states, S, and 

We shall imagine the machinery of passage from one state to another 
to be as follows: We visualize that to each state there corresponds a 
bag of balls. The bag B, corresponds to state S, and contains balls 
numbered either zero, one or two. The bag B,, corresponding to state 
S,, contains balls, all of which bear the number one. Bag B,, corre- 
sponding to state S., has balls numbered zero, two or three. Finally, 
bag B, corresponds to state S; and contains balls with the number three 
on them. It will be noticed that the numbers on the balls in each bag 
correspond to the postulated possible passages from the corresponding 
state to other states. According to the model we consider, a patient 
who, at the beginning of a given time element 7, is in a particular state 
decides his fate during this element of time by drawing a ball out of 
the bag corresponding to this particular state. His fate may be either 
to remain in the same state for the rest of the time element + or to be 
transferred during + to another state, according to the number on the 
ball drawn. Thus, if at the beginning of +r the individual is either in 
state S, or in state S,, he draws a ball out of B, or out of B;. Since 
these bags contain balls of only one kind, the fate of the individual 
is predetermined and he stays in the same state indefinitely. This 
means that bags B, and B, can be forgotten. However, if at the 
beginning of + the individual is either in S, or in S., then the ball 
drawn either from B, or from B,, respectively, will decide whether the 
individual stays in the same state for the whole of the time element r 
or is transferred to another state. 

We have denoted by M the total number of consecutive time elements 
forming the basic period of observation T. It follows that the fate of 
an individual during T is decided by at least one but not more than M 
consecutive drawings of balls out of the bags By, B,, B, and By. 

Even though the fate of an individual during time 7 may be com- 
plicated, involving several recoveries and subsequent relapses, the above 
scheme permits us to calculate the probabilities of his fate. The formulae 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY 2138 


for these probabilities depend on M. However, since it is clear that the 
division of time T into equal elementary time intervals 7 will be the less 
objectionable the smaller the time elements 7, we make M tend to infinity 
and + to zero after we obtain the desired formulae. The limiting 
formulae are then treated as the probabilities of the contemplated fate 
of the individuals. In this passage to the limit we have to make con- 
tinuous changes in the contents of the bags B, and B, in order to 
conform to the intuitive idea that, as the length of +r decreases, the 
probability of a transfer from the given state will decrease accordingly. 

The system of bags with balls and drawings by an individual to 
determine his fate for a given element of time represent the “ opera- 
tional ” description of the model studied. In actual fact, the deduction 
of formulae for relevant probabilities can be made in a simpler way, 
details of which are given in the appendix. Here it will suffice to state 
that, if we ignore the subdivision of states S, and S, into substates S,’, 
S,” and S,’, S,”, respectively, then all the probabilities of combinations 
of transfers from one state to another within a given time 7 depend 
on the duration of 7 and on four constants, which we denote by qo, oz, 
qzo and 23. These constants are related to the frequency of transfers 
between states indicated by the subscripts. Thus, the greater qo, the 
more frequent are the transfers from state S, to S,. Further, if gi; = 9, 
then transfers from S; to S; are impossible. However, the constants qi 
are not probabilities and their values may exceed unity. Because of 
this fact and because of the relation between the value of qi and the 
frequency of transfers S;—>S;, each constant qi; will be called the 
intensity of risk S;— S;. 

As we have mentioned, there is no essential difficulty in retaining in 
the model substates S,’ and S,” and/or substates S,’ and S,”. If we do 
so, then instead of two intensities of risk, qo; and q.3, it will be necessary 
to consider four, namely qo1’, Gor”, Gea’ and q2;”. The new intensities 
of risk are connected with the ones considered previously by the equations, 


qo = Jor’ + Jor’ 

and the complication of the problem due to substates S,’, S8,’”, S,’ and 
8S,” is minor. 


Remark. In general, intensities of risk are functions of time. How- 
ever, in the present model, the intensities are regarded as constants in 


e 
T 
i 
q ) 
| 
| 


214 BE. FIX AND J. NEYMAN 


order to give effect to the hypothesis that the period of observation T 
is sufficiently short so that changes in them may be neglected. In the 
operational description of the model, this hypothesis corresponds to the 
assumption that the composition of balls in bags By and B, remains the 
same for all elements of time considered. 

With the mathematical details relegated to the appendix, this com- 
pletes the description of the probabilistic model of variability in the 
fate of individuals belonging to a uniform group who at time t = 0 are 
in any given state. Further developments depend on the questions one 
may wish to answer. There are two different categories of questions. 
One category relates to the information one may wish to extract from 
the model when all intensities of risk gi; are known. The other category 
of questions refers to the system of observations on actual patients— 
what system will lead to reliable estimates of intensities of risk at the 
least expense and labor for the investigator. Although this is a logical 
classification of the problems to be considered and will be kept in mind, 
it seems simpler to begin the presentation of results with problems 
suggested by the present day methods as illustrated in the works of Drs. 
Harrington and Robnett and, then, to proceed to other possibilities. 


3. FIRST METHOD OF APPROACH 


In this section we shall consider the category of problems suggested 
by the data of current follow-up studies, with only slight modifications. 
The usual observations refer to a group of some N individuals who at 
t= 0 are in an original state S,. As Berkson mentioned, this state 
may be variously defined, but for the sake of definiteness we shall refer 
to S, as the state of “being under treatment for cancer.” At a sub- 
sequent time 7 the individuals originally in S, appear distributed 
between four states S,, S,, S, and S; and the data give the numbers 
Noi, of these who are found in state S,;, for i—0,1,2,3. Actually, 
some studies, including those of Dr. Harrington and Dr. Robnett, are 
less detailed and, instead of giving the numbers No and No separately, 
give only their sum Noo + Noz, representing the number of all indi- 
viduals of the original group known to be alive at time 7. It is under- 
stood, however, that efforts are being made to separate this broad category 
(that is, S,-+ S.) into two separate categories S, and S,.5 


* See, for example, [8], especially p. 7. 


| 
| 
\ 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY — 215 


If the only data available for the estimation of intensities of risk 
are the numbers Noo, Noi, Noz and No3, the first problem that suggests 
itself is the calculation of the expectations of No, say NQ*u(0,T). 
These expectations could be compared with the values of Ny actually 
observed and used to deduce estimates of intensities of risk. The symbol 
Q*o; requires more detailed explanation. Actually, we are going to 
consider a slightly more general symbol, denoted by Q*yj(¢:, 2), which 
stands for the probability that an individual in state S; at time ¢, will 
be found in S; at the subsequent time ¢,. Here i and j may be equal or 
not and may assume any values 0, 1, 2, 3. 

The probability Q*(t,, ¢2) is called the crude rate of risk S;— S; 
during the time interval from ¢, to ¢,. After formulae for the crude 
rates have been deduced, it is found (as one would guess beforehand) 
that each crude rate Q*,;(t,, ¢) depends not on ¢, and ¢, taken separately, 
but only on their difference, say ¢ = ¢,—1t,, provided, of course, that 
the two moments ¢, < ¢, are both within the same time interval (0, 7’) 
for which the intensities of risk are constant. For this reason, the 
notation Q*,;(t,, ¢.) can be simplified to Q*;;(t) where ¢ stands for the 
length of the time interval beginning with ¢, and ending with f,. 

The formulae expressing the crude rates of risk in terms of time ¢ 
and the intensities gj; are somewhat complicated. In order to simplify 
the writing, it is convenient to define two constants, say 


A = + + Y20 + — V + Yo2 — Y20 — Jes)? + 4402920}, 
A= + qoz + qzo+ q23 + V (qo + Yo2 — — + 4402420}. 


These constants then serve to define functions of time which we denote 
by X and Y, 


+o}. 
Finally, we may write 


Q* oo(t) 3(qor + Jo2— Y2o — + Y, 
— gor X + (1 — + + + X — YI), 


(1) 
Q* o2(t) Jo2X, 


Q* = [1 — $(qo1 + Yo2 + + Y2s)X — Y]. 


nT 
the 
the 
the | 
| 
om- | 
the | 
are | 
one | 
ns, 
om 
ory | 
— 
the | 
cal 
nd, 
ms 
rs. 
| 
of 
at 
od 
y, 
re 
142 
a 


216 E. FIX AND J. NEYMAN 


These are the crude rates of transfer from state S,. The crude rates 
of transfer from S, have similar expressions. 


Q* 20 (t) 


= [1 — + + + 92s) X — Y], 


(2) 
$( gor + — J20 — X + 


Q* 2: (¢) 
ges (x4 220) + 

The formulae just given refer to the problem of estimating inten- 
sities of risk gi from observed values Nig, Nix, Nig and Nis, 1 = 0, 2. 
On the other hand, the probabilities Q*;; by themselves cannot con- 
veniently serve for the evaluation of the intensity of disease in a cate- 
gory of patients or the evaluation of the relative effectiveness of different 
treatments. The probabilities Y*;; were termed “ crude” rates of risk 
and the adjective “crude” is used to emphasize that the transfer con- 
templated (from S; to S;) is affected by other risks competing with this 
particular risk. Thus, for example, the probability Q*,,(¢) is computed 
under the express assumption that, during time ¢, an individual originally 
in Sy is subject to the risk of being transferred to S, and thence to S; 
(being dead from causes not connected with cancer or being lost). 
Naturally, the presence of the risk of transfer Sy ~ 8, — S; diminishes 
the risk of transfer S,—>S,. Furthermore, this decrease in the risk 
So — 8, is artificial since it depends on the frequency of deaths from 
other causes and on the frequency of losses and has nothing to do with 
the danger of death from cancer. Thus, in order to evaluate the latter 
danger, we compute what we call the net rate of this particular risk. 

In general, let S;—S;, Sa—S,, S-—>Sa,- be the possible 
transfers within a system of states. Then the probability of transfer 
S,;— S; during time ¢t, computed on the assumption that qa» = qea 
=: --==( is called the net risk of transfer S;—> S; with risks Sg — Sp, 
eliminated. This net risk is denoted by 

In evaluating the effectiveness of two methods of treating cancer, 
it is natural to consider the net risk of death from cancer with the risk 
of being lost or of dying from other causes eliminated. This rate is 
then Po;j2,(t). The formula for Po:\2;(¢) is obtained directly from 
that for Q*o,(¢) by substituting in the latter 0. 


) 
§ 


= 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY 2\7 


The net risk, Po:\23(¢), corresponds to the old established criterion 
of effectiveness of treatment, namely to the probability of death from 
the disease during a specified time ¢ when other causes are eliminated. 
However, the formula resulting from our analysis is substantially more 
complicated than that in common use, labeled “the actuarial method.” 
In our notation, the latter formula may be written as, say, 


Q*o1(t) 
It appears that Do,(#) is always less and occasionally substantially less 
than the net risk Po:j23(¢). The concept of net rate of risk and the 
appropriate formula for its computation are solutions of a problem of 
evaluation either of the severity of a disease or of the effectiveness of a 
treatment. 

While the frequency of surviving a specified period of time is un- 
doubtedly an important characteristic of a method of treating a disease 
like cancer, we are inclined to think that it can be usefully supplemented 
by some other characteristics. One of these is the expected length of 
normal life within a specified period of time, say ¢, following a given 
treatment. In thinking of friends of ours who have had the misfortune 
of suffering from cancer, we remember cases of widely different kinds. 
In some cases, the original diagnosis and operation was followed by a 
long uninterrupted period of unspeakable suffering, culminating in 
relieving death. In other cases, the person concerned had prolonged 
periods of normal life during which he went about, worked and enjoyed 
life. Then, there would be a relapse, with another operation and stay 
in the hospital, which would be followed by another substantial period 
of apparently good health. We emphasize the word “ apparently ” 
appended to “ good health.” It is possible that the patient last described 
was continuously ill with cancer no less than the one confined to bed. 
However that may be, there is a very important difference in the two 
situations and, to us, this difference appears to be a substantial element 
in the comparison of various methods of treatment and in the classi- 
fication of different types of the disease. In other words, if two different 
methods of treatment result in about equal net survival rates but in 
one case the average length of “ normal life” of a patient exceeds that 
in the other, then, or so it appears to us, the two treatments are of 
unequal value. One is even tempted to assume the point of view that 
it is the average length of normal life following a treatment rather than 


| 
i 
| 
| 
2. 
1- 
| 
it 
k ) ij 
i 
is 
ao 
| 
| 
) 
h 
‘ 
| | 


218 E. FIX AND J. NEYMAN 


mere survival that is the appropriate measure of the successfulness of a 
treatment. Admittedly, this is the point of view of laymen in medicine. 
However, it was a pleasure to find a passage in Dr. Harrington’s paper 
which suggests at least a partial conformity with the view adopted by us. 
This passage ([5], pp. 7, 8) reads: 

“T have accepted for operation all patients to whom, I felt, there was a 
reasonable chance of offering comfort or greater length of life as well as those 
whose disease, I felt, stood a reasonable chance of being cured. It may seem 
that these rules of operability have not been drawn strictly enough and that 
cases have been accepted for operation in which the growth is too extensive. 
This is a matter of opinion, however, and justification has been found in 
many cases in which the condition was thought to be absolutely hopeless 
before operation but in which the patients have lived to enjoy many years 
of comfort.” 


To our knowledge, the average length of normal life following a 
treatment or following an initial diagnosis has not figured in statistical 
studies in medicine thus far. There are various possible reasons for this. 
One is the difficulty in establishing a specific criterion to decide whether 
or not at a given moment a person leads a “normal life.” Another 
difficulty is connected with direct observation of a period of normal life: 
when did such a period begin? when exactly did it end? 

The first of these difficulties is essential. If it is found impossible 
to overcome, then in follow-up studies, it would be impossible to sub- 
divide the group of individuals living at the moment 7 (that is, the 
group in S, + S, combined) into separate groups S, and S,. The docu- 
ment quoted above encourages us to think that some sort of convention 
can be adopted to distinguish the two groups, perhaps with modified 
labels. For example, it may be agreed to consider that, if on a particular 
day the patient was able to and actually did leave his home, without 
help, either for business or for recreation, then, on this particular day 
the patient led a “ normal life,” irrespective of the fact that, say. a week 
later an operation became necessary and a substantial tumor was dis- 
covered. We wish to emphasize that de do not insist on any particular 
convention, but only that the differences in the way particular patients 
survive treatment make it necessary to adopt some conventional definition 
of “leading normal life ” or of some such convenient term. 

The category of patients “leading normal life” belongs to the 
phenomenal sphere of thought. Within our mathematical model it 
corresponds to state S,. We wish to point out that, if an appropriate 


) 
| 
| 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY _ 2i9 


convention is adopted to decide whether or not a given patient led a 
normal life on the day he was interviewed in the follow-up procedure, 
then this would be sufficient to estimate the average or the expected 
length of normal life within a specified period such as a year. The 
actual estimation of the length of the periods of uninterrupted normal 
life is not strictly necessary. 

Let @o2(7') stand for the expected period of normal life within the 
period of time from t= 0 to t= T for a patient who at time t= 0 is 
in state S, and for whom the inessential (for the problem at hand) 
risks of loss from observation and of death from causes other than cancer 
are removed. Let further Po2}23(t) be the net rate of risk S,—> S, in 
time ¢ with the risk S.—S, eliminated. Then 


o:(T) — Postan( 


q2s 


and it is obvious that this formula can be evaluated numerically provided 
we have the estimates of the parameters qo:, doo and oo. The above 
formula for éo2(7') gives the expected length of normal life during the 
first period of observation, say one year, following the intial diagnosis. 
A similar formula holds for any subsequent period of length T during 
which the parameters qi; may have values different from those in the 
first. Also a different but similar formula holds for patients who at 
the beginning of a given period of observation are found in state S,. 
A series of such expectations computed for a number of successive years, 
combined with net rates of risk, would then provide the total expected 
length of norma! life during the years following t = 0. 

Operationally, e.(7), that is, the expected length of normal life 
during time 7’ of an individual originally in state S; (either S, or S2), 
is interpreted as follows: Imagine a large number of individuals of a 
specified category who originally are in state S; and for whom the 
inessential risk S.—> S, has been eliminated. Observe these individuals 
for time T and measure each period of normal life that any individual 
has before the expiration of time T. Add the lengths of all such periods 
and divide the total by the number of individuals. Then, e;.(7') is the 
idealization of the ratio,so obtained. 

In addition to the expectations ¢.(7'), the similarly defined expec- 


i 
a i 
le, 
er 
iS. 
a 
m 
at 
in 
38 
rs 
| 
| ! 
| 
4 
r i 
| 
2 ) 
} 


220 E. FIX AND J. NEYMAN 


tations e (7) of time of illness from cancer within time T may be of 
interest. The total, say 


e:.(T) eio(T) + ein(T), 


represents the average survival time within the time interval (0, 7’). 

The main purpose of deducing formulae for crude rates of risk is to 
provide a means of estimating the intensities of risk by comparing the 
observed numbers No; with their expectations NQ*(7). In this con- 
nection it is appropriate to point out a difficulty which arises during 
the first period of observation, say during the first year (if JT — one year) 
of the follow-up procedure. Imagine that observations have been made 
and that, after twelve months, out of the N persons originally under 
treatment for cancer, exactly No, were dead from cancer, exactly No, 
were leading normal life and exactly No; were untraced or dead from 
other causes. The remainder, that is, Noo = N — No, — Noz— Nos, 
were under treatment for cancer. For the determination of intensities 
of risk, we have the following equations, 


The total number of equations is four, exactly as many equations as 
unknowns, Yor, Yzo and Unfortunately, however, equations (3) 
are not independent because, as it is easy to verify by inspecting the 
formulae for intensities of risk, 


Q* 00 + + Q* + Q* os 1, 


and, by adding the four equations (3), we obtain the identity NV — N. 
Thus, from (3), we can select any group of three equations and 
use it to determine three unknowns, but not four. It follows that, 
in order to estimate four intensities of risk, some supplementary informa- 
tion regarding the group of patients considered must be obtained. 
This may be achieved in various ways. One way is to subdivide 
the first period T of observation (during which we assume the intensities 
of risk to be constant) into two equal parts. If 7 —12 months, this 
would mean that information regarding all patients would be collected 
after six months and again after twelve months from the beginning 
of the period of observation. At the conclusion of the first six months 
period, a certain number of patients will be found in state S,. By 
observing these individuals during the subsequent six months, figures 
will be obtained representing the empirical counterparts of the crude 


| 
| 


of 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY 221 


rates Q*.,(4$7') and this will provide a sufficient number of independent 
equations to estimate the four intensities of risk. 

The difficulty just described refers to the first period of observation 
but not to subsequent ones. The reason is obvious: at the start of each 
period of observation beginning with the second, one part of the survivors 
of the original N individuals will be in Sp and another part in 8,. The 
first of these groups will provide three independent observations to be 
compared with @Q*,;(7) and the second, another three independent 
observations to be compared with Q*.;(T). 

Details of the various aspects of the problem of estimation are 
relegated to another publication. Some preliminary results are given 
in Section 6. 


4. NUMERICAL ILLUSTRATION 


Now it appears appropriate to illustrate on some numerical data the 
working of the mathematical model just described. This is done in 
Table 3. 

Table 3 was constructed to illustrate several points. One point is 
that two substantially different systems of values of intensities of risk 
may lead to the same net rates of death from cancer but to very different 
expectations of normal life. The purpose, then, is to emphasize the 
desirability of using expectation of normal life in the evaluation of 
effectiveness of treatments, at least as a supplement to net rate of death 
from the disease. 

With this in mind, two systems of crude rates qi; are considered in 
Table 3. They were selected to produce crude rates of the four risks 
corresponding roughly to actual observations during the first five-year 
period of a follow-up study which we had occasion to see. Since the 
period of five years appears too long for the assumption that intensities 
of risks are constant and since the observations did not include the 
necessary category S., no effort was made to reproduce the observed 
figures exactly and intensities of risk are given with precision to one 
decimal only. As a result, the net rates of death from cancer, denoted 
in Table 3 by Po:, do not exactly coincide (.324 in one case and .287 
in the other). However, in both cases the rates are approximately 
equal to 30 per cent, which we consider sufficient for the purpose of 
illustration. The last column of Table 3 gives the expectations of life, 
during time period T, first in state S, (under treatment for cancer), 


i 
to 
he 
n- 
ng 
r) 4 
de 
er 
m | 
i 
es 
as 
3) 
he a 
id a 
it, 
a- 
de 
eg i 
‘is 
1g 
hs 
by 
es | ‘ 
de | 


E. FIX AND J. NEYMAN 


222 


= (1) 


= 
= "9 030 = ".0 100 =".0 10 
L196" = 
L816 = =".0 coo coo 
(1) (9) (¢) (¢) (3) (1) 
“avd NI GATT AO aNZ 40d SALVE 404 S4ULLISNALNI 
HLONGI 40d SALVU LAN SaLVa 


ysis fo 


fo swajshs om) sof afr) fo fo pup systs fo 89304 pun 


ATAVL 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY ~— 223 


next in state S, (leading normal life) and, finally, the total expectancy. 
The important point to notice is that, with the first system of intensities 
of risk, the expectancy of normal life for an individual originally in state 
So i8 €o2(7') = .583T while with the second system of intensities it is 
only éo2(7') = .122T or, roughly, one fifth of the first. This is the 
situation whose possibility was suggested to us by the experience of some 
of our friends. If the difference between the two situations reflected 
in the two systems of constants qi is due to a difference in treatment 
then, in spite of the fact that the first system has a slightly higher net 
mortality rate from cancer, we would favor it rather than the second. 
The reason for this preference is further illustrated by the net rates of 
risk P.; for those persons who are in state S, leading a normal life at 
the beginning of a given period of observation. It will be seen that 
chances for these individuals of surviving T in good health (P22) are 
substantially higher for the first system of qi; than for the second. In 
actual practice, the first system of intensities of risk might possibly 
correspond to a more radical method of operation than the second, 
perhaps with a substantially higher rate of mortality during the operation. 
This higher mortality rate is then compensated by improved chances of 
subsequent complete recovery. However, this is only a suggestion of a 
possible interpretation. As was said before, the medical side of the 
problem is entirely beyond the competence of the present authors and 
their purpose is solely to indicate the various logical possibilities. 

In addition, Table 3 was constructed to illustrate another point. 
This is the subdivision of the first period of observation into two equal 
parts, as suggested at the conclusion of the preceding section. After 
the first half of the period, the observations would yield the empirical 
counterpart of the crude rates exhibited in the second column of Table 3. 
It follows that, given the first system of intensities, about 57 per cent of 
the original patients would be leading normal life (state S,) and about 
5 per cent would be under treatment (state S,). Then, observation of 
these two groups during the second half period would yield the empirical 
counterparts of crude rates in the two following columns of Table 3, 
one a repetition of column 2 and the other the crude rates Q*.;(47). 

The last circumstance provides the possibility of estimating all four 
intensities of risk characterizing the first period of observation. Now 
we wish to point out another important use of observations which relate 
to separate sections of a period 7 when it is presumed that the model 
constructed is applicable to the whole period. Such observations relating 


4 
4 
| 
| 
4 


224 E. FIX AND J. NEYMAN 


to two or more subsections of period 7’ may be used to verify the adequacy 
of the model. One such possibility is the comparison of the crude rates 
of risk appropriate to the same category of individuals (either the rates 
for those in S» or for those in S,) which have been observed during the 
several subsequent sections of JT. If the value of a given crude rate 
varies significantly from one section to another, then the model is not 
adequate and, to serve a useful purpose, must be modified. 

There are two principal respects in which we feel the model may 
appear defective. One danger is that the period T as observed may be 
too long to satisfy the assumption that the intensities of risk are constant. 
The other involves the assumption that intensities of risk do not vary 
from one patient to another, or, to put it differently, that the classifica- 
tion of cases of a particular disease is sufficiently fine so that the 
unavoidable residual variability can be ignored. It is natural to post- 
pone the analysis of these possibilities until such time as empirical data 
are available to study the adequacy of the model now considered. 


5. SECOND METHOD OF APPROACH 


The method of approach described in Section 3 was dictated by the 
customary scheme of follow-up studies. All formulae for crude rates 
of risk given in that section are of a certain complexity and, although 
they provide the possibility of estimating intensities of risk, the com- 
putations involved are messy. More important than this, a preliminary 
study indicates that, in order to attain a desirable level of precision of 
the estimates, the number N of individuals in the original group of 
patients must be very substantial. Considerations of this kind suggest 
the desirability of some modification in the follow-up procedure which 
would lead (a) to simpler formulae and (b) to more precise estimates. 

It must be clear with reference to point (b) that, if more precise 
estimates are desired without increasing the number of observations and 
if the method of estimation used cannot be improved, then the only 
prospect of success is connected with the possibility of obtaining more 
information about every individual concerned. 

We have seen that the values of the four intensities of risk, go:, qo2, 
Gzo and q23, determine all details of the mass events in the model studied, 
such as, for example, the average duration of normal life. Yet, in order 
to estimate these intensities, the customary scheme of observations does 
not involve any direct information regarding the duration of any period 


= w 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY — 225 


of normal life. Instead we contemplate the use of records giving the 
number of persons who at specified moments are either dead from cancer, 
under treatment, leading normal life or lost (dead from other causes). 
Thus, information regarding the duration of periods of illness or of 
normal life, implicit in the values of the qi, is to be derived from 
indirect observation, that is, from data regarding a few single moments. 
In these circumstances, it is not surprising that the volume of observa- 
tions must be large in order to evaluate the intensities gi; with precision. 

An obvious method of improving the estimates of intensities of risk 
without increasing the number of individuals concerned is to obtain 
information about the duration of some specified phases in the fate of 
at least a part of the individuals considered. In this we encounter the 
difficulty, already mentioned, that even if it is possible to decide that 
on a specified day a given individual leads a normal life (is under treat- 
ment for cancer), it is likely to be impracticable to determine the exact 
beginning and the exact end of this period of normal life (period of 
illness). Our hopes of extracting more information with regard to 
individual cases are limited to the following: 

We presume that of the N individuals originally in S, (under treat- 
ment) the follow-up procedure will be able to establish the number Ngo: 
of those who died before the expiration of the period of observation 
without having had any period of normal life. We presume further that 
the follow-up procedure can establish perhaps two other numbers, say 
Nooo and N¢o2x. Here Nooo denotes the number of patients, originally 
in S,, who at the conclusion of the period of observation are still in S, 
without having had any period of normal life. Similarly, No. stands 
for the number of persons, originally in S,, who before the conclusion 
of the period of observation recovered (transferred to S,) and, following 
this transfer, remained in S, without having any relapse after the initial 
recovery. It will be noticed that it is much easier to establish whether 
or not a given individual belongs to any one of these three categories 
than to determine the exact duration of a period of normal life. 

With regard to individuals who at the beginning of the period of 
observation are in state S., we hope that it may be possible to establish 
the numbers in the following categories: The category easiest to deter- 
mine is composed of individuals who at the termination of the period 
are still in S, without having suffered a relapse. The symbol ¢.2 denotes 
the corresponding relative frequency. A second category with relative 
frequency ¢2, is composed of persons, originally in S,, who suffered 


y 
> 
e 
e 
y 
e 
a 
y 
e 
2 |} 
4 
3 
y 
f 
f 
t 
1 


226 E. FIX AND J. NEYMAN 


a single relapse and at the conclusion of the period of observation are 

in Sy. Finally, we expect it to be possible to establish the relative fre- 
quency ¢2, of cases where an individual, originally in S,, suffered a | 
single relapse and subsequently died from the disease before the expira- ( 
tion of the period of observation. 

It is intuitively clear that if the relative frequencies enumerated 
above, doo, o1, G22, 20 and or at least some of them, could 
be combined with observations of the numbers of patients in So, S,, S2, 
S;, at the conclusion of the period of observation, then the precision 
of estimates of the four intensities of risk would be increased. It so 
happens that the formulae for the probabilities representing the 
theoretical counterparts of the relative frequencies just described are 
very simple. These formulae are deduced in the appendix. 

Let i stand for either 0 or 2 and let 7 —0,1,2,3. Also let k bea 
non-negative integer. Then the symbol Qijx.(¢) represents the prob- 
ability that an individual, originally in state S;, will be transferred out 
of the state exactly & times during the subsequent ¢ units of time and 
at the end of the ¢ units will be found in S;. Obviously Qoo.0(7') is the 
theoretical counterpart of oo. 

There is a similar correspondence between Qo:.1(7') and ¢o:, between 
Qozi(T) and ¢o2, etc. The formulae for these probabilities are 


Qoo.0 ( t) 


Q22.0 (t)= (dat das)t 


01.1 t)= 1— 00.0)> 
Q ( ) Joit+ ( ) 


(4) 
Qo0.0(t)— Q22.0(t) 
J20+ Jo2 ‘ 


Qoz.1(t) = do 


Q0.1(t) = 02.15 
0 


(Gor +o2)(G2o+ 2s) 423— Yo1— Yo2 
There is little doubt that the determination of the relative fre- 
quencies goo, $02, $22, $20 and represents a substantial com- 
plication of current practice in follow-up studies. Nevertheless, the 
authors feel confident that these data could be collected if it seemed 
worthwhile to make the effort. Promising sources of such data are the f 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY = 227 


armed services who may be expected to keep track of their personnel 
and also institutions of health insurance. In particular, one might 
expect a very substantial amount of information from the British 
National Health Service. 


6. PROBLEMS OF ESTIMATION 


Although details of various problems of estimation, of the precision 
attainable with a given scheme of observations, and of testing various 
hypotheses, will be given in another publication, it seems appropriate 
to indicate the general idea of their treatment. 

It may be useful to begin with a description of the estimates used. 
Originally, they were called BAN estimates which stands for “ best 
asymptotically normal” estimates. For reasons given in Part 3 of the 
appendix, it now appears desirable to alter this label to RBAN estimates, 
connoting “ regular best asymptotically normal ” estimates. BAN esti- 
mates were introduced in a paper [9] by one of the authors and have 
the following properties: In general, consider several, say s, sequences 
of completely independent trials. Let 


Ra, Riz,- * » Ress i= 1, 2,- 


denote the possible exclusive outcomes of each trial belonging to the 1-th 
sequence and let 


Pairs * * > Pim 


be the probabilities of these outcomes. We assume that these prob- 
abilities do not change from one trial to another of the same sequence 
and that none of them is equal to zero. 

In relation to the follow-up studies, the fate of a single individual 
during a period 7 of observation represents a single trial. If all 
individuals considered are in S, at the beginning of period 7, then we 
deal with just one sequence of trials and R,,, Riz,- - -, Ri», represent 
the mutually exclusive possibilities regarding each individual that we 
observe at the end of T. If at the beginning of the period of observa- 
tion, some of the individuals studied are in S, and others in S,, then 
we have s=2 series of trials. R,,, Ry, are the various 
possibilities corresponding to each individual originally in S, and 
Rey, are the possibilities corresponding to each individual 
originally in S,. In each case, the probabilities denoted here by py, 


| 
| 
| 
| 
d | 
d | 
| 
n | 
0 
e 
e 
| | 
| | 
| | 


228 E. FIX AND J. NEYMAN 


are given by the formulae (1), (2) and/or (4), in accordance with the 
exact definition of the possible outcomes of observation Rj. 

We consider the case when each probability pj; is a specified function, 
say fij(@,,@2,- -,@m), of several parameters @,,@,,° - -,@m, having 
continuous partial derivatives up to the second order with respect to all 
the @’s. In the present case, the role of these parameters is played by 
the four intensities of risk, go:, doz, Y20 and qos. 

Denote by N;, the number of trials in the i-th sequence and let 
N=N,+N.+:--+N, be their sum. In the situation in which 
BAN estimates are defined, it is assumed that the number J, of trials 
in the 1-th sequence grows without limit in such a way that each ratio, 


N. 


remains constant or, at least, is bounded away from zero. Denote by ni 
the number of occurrences of the outcome Ry in the course of the N,; 
trials forming the t-th sequence. Let further 


ni 


stand for the relative frequency of this outcome. Consider the class C, 
of functions of the ¢’s so defined having the following properties: 


(i) every function belonging to C; does not depend explicitly on NV; 

(ii) every function belonging to C, has continuous partial deriva- 
tives of first order with respect to every ¢i;; 

(iii) as No, every function belonging to C;, converges in prob- 
ability to the true value, say @,°, of the parameter @,. 
It is proved that the class C, so defined possesses also the following 
property : 

(iv) every function of class C;, is asymptotically normally distributed 


about the asymptotic mean ®,° with asymptotic variance o;*/N where 
o, does not depend on N. 


Functions of class C;, will be called “ regular asymptotically normal 
estimates ” of @,. 


DEFINITION. Jf a function 6; of the relative frequencies belongs to 
class C;, (that is, if it possesses properties (i), (ii), (iii) and, therefore, 
(iv) ) and if its asymptotic variance o;7/N does not exceed the asymptotic 


| 

| 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY ~— 229 


variance of any other regular asymptotically normal estimate of the 
parameter @, then 0 is called the regular best asymptotically normal 
estimate of ®, (RBAN estimate, for short). 


In [9] just quoted, it is shown that RBAN estimates always exist 
and can be obtained by any one of several specified methods. One of 
these methods is the method of maximum likelihood. Another is the 
method of minimum modified x? with reduced side conditions. At this 
point it may be useful to mention that, because of the apparent similarity 
between the concept of RBAN estimates on the one hand and consistent 
and asymptotically efficient estimates on the other, some of our friends 
have expressed doubts on whether or not the introduction of the new 
term is justifiable. This question is discussed in Part 3 of the appendix 
and it will be seen that the two concepts are different. In particular, 
it appears that, at least in the case discussed in this paper, the maximum 
likelihood estimates satisfy the definition of RBAN estimates but, con- 
trary to general belief, not the definition of asymptotically efficient 
estimates. 

While the asymptotic properties of RBAN estimates obtained by 
different methods are the same, the finite sample properties may differ 
considerably. In particular it may be expected that the use of the 
minimum modified x? method with reduced side conditions will require 
a substantial number of observations in order to achieve a good approxi- 
mation to the asymptotic normal distribution of the estimates. On the 
other hand, as Berkson [10] showed, the minimization of a sum analogous 
to the familiar x? leads in certain cases to estimates with finite sample 
variances which are smaller than those of maximum likelihood estimates. 

The simplest problem of estimation connected with the stochastic 
modeis of risks studied in this paper corresponds to the following “ sim- 
plified ” scheme of observations: 

We assume that at the beginning of a period of observation of length 
T there are N, individuals in Sy and N, in S,. These two groups form 
two systems of “ independent trials.” At the conclusion of time 7, we 
separate the individuals into categories in this fashion. First, consider 
the individuals who are in S, at the beginning of time 7. 


Noo =the number who survived the period of observation without 
having a single recovery (property Roo), 

No: =the number who died from cancer without having a single 
recovery (property Ro:), 


|) 
q 
| 
ly i q 
| 
i 
t 
| 
| 
| i! 
| 
j 
| 


230 E. FIX AND J. NEYMAN 


Nos = the number of all other individuals originally in S, (property 
Ros). 
Thus Nos; = Ny — Noo — Noir, where Ny denotes the number of indi- 
viduals who were originally in So. 

Now turning to the N, individuals who originally were in S2, we 
separate them as follows: 

N22 = the number who survived the period of observation 7’ without 
a relapse (property R22), 

N2 =the number who survived the period of observation, had a 
relapse and did not have a single recovery (property Roz), 

N23 = the number of all other individuals originally in S, (property 
R23). 
Thus, N23 = Nz — N22 — Noo. 

In general, we use the notation, 

Ni 

On referring to formulae (4), the probability of each possible outcome 
of the trials in each sequence is easily found and the reader will have 


no difficulty in verifying that the maximum likelihood estimates, say 
qi of the four intensities of risk are, say, 


log Poo Por 
qo T 


log Poo 1 <= doo dor 


T oo > 
log doo — log p22 peo 
T 
log $22 


T — q20- 


As another illustration of the problem of estimation we will consider 
the system of observations outlined in Section 3. We will assume that 
at the beginning of a period of observation of length T there are N, 
persons in state S, and N, persons in state S,. At the expiration of T, 
each of these groups is found distributed among four states and we 
denote by Ni; the numbers of individuals originally in S; who are found 
in S; at the end of T, where t = 0, 2 and j = 0,1, 2,3. Let diy = Nij/Mi. 


| 


ve 


it 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY 231 


In the present case, the only method of obtaining RBAN estimates 
of the four intensities of risk which appears feasible is the method of 
minimum modified x?. This reduces to finding the values of Q*,; which 
minimize the expression 


ot (Q* oj — 05)” +N, (Q* 25 — $25)? 
j=0 
under the side conditions, 


Q* oy -> 1 


j=0 


u 


and, say 
PF, * 210 * 22 Pat Q* * a1 + 20 Q* 21 os 0, 


F,= (Q* 00 Q*22)? + 4Q* 20 log 200 *o2 


Q* + Q* —vV (Q* oo Q* + 40* * 20 


—(2Q*20 + Q*22— Q* oo) log 


Since the last two conditions are awkward, especially the last, it is 
unavoidable to substitute for them their reduced forms. Details of the 
procedure involved are explained in an article by Chiang [11]. After 
the minimizing values of the Q*;; are obtained, formulae (1) and (2) 
yield the RBAN estimates of the qj. 

The two examples just outlined illustrate the dependence of the form 
of estimation problem on the system of observations in the follow-up 
procedure. In a future paper, we propose to study in detail the accu- 
racy of the estimates obtained under the various possible systems of 
observations. 


APPENDIX 


Al. INTRODUCTION 


We have assembled in the appendix the mathematical discussions 
related to the main part of the paper. In Section A2 we deduce the 
various probabilities connected with the stationary Markov process (see 
Feller [12]) which is postulated to represent the machinery behind the 
varying fates of individuals of a specified basic group. Some of the 
probabilities deduced are given in the main part of the paper. In 


== () 
Q* 00 + 22 + V (Q* 00 — 22)? + 4Q* * 20 


i 
| 
ty i | { 
| 
| 
| 
e 
q 
q 
4 
) | 


232 BE. FIX AND J. NEYMAN 


Section A3 we discuss the distinction between RBAN estimates on the 
one hand and consistent and asymptotically efficient estimates on the 
other. 

A2. PROBABILITIES OF MULTIPLE TRANSFERS 


We consider a system of four states S,, 8,, S:, Ss; with possible 
transfers as indicated in Fig. 3, so that S, and S; are terminal states, 
For i equal either 0 or 2, for 7 = 0,1, 2,3, for k 0,1, 2,- - - and for 
any two numbers ¢,, t2,0 St, St, = T, we postulate the existence of a 
probability Qi;,.(4:, ¢2) that an individual found in S; at moment ¢ = t, 
will leave this state exactly k times before the arrival of moment t = t?, 
and that at ¢ = ¢, this individual will be found in state S;. In accordance 
with this definition, we postulate 


fort—0,2 andk=1. Further, if 1 j, then 


tr) = 0. 


It will be convenient to use the symbol Qix(t:,¢2) for negative values 
of k also. In this case, the value of the symbol will be zero. Further, 
we postulate that each probability Qijx(t:,¢2) has a partial derivative 
with respect to ¢,, at least at ¢, —¢,, and that, if 0=¢t,=T, this 
derivative has a value independent of ¢t,. In order to give effect to the 
postulated machinery of transfer from one state to another, we will 
assume that, for probabilities Qi;, which refer to more than one transfer 
between ¢, and ¢,, these derivatives are equal to zero so that for 1 = 0, 2 
and k= 1, 

te) | 


ots | to=ty 


and, similarly, 


0 


if t, is set equal to ¢,. As to the other derivatives, we shall denote 
them by qg with appropriate subscripts indicating the initial and the 
final states in which the individual is supposed to be found at moments 
t, and ¢, respectively. Specifically, we shall consider the following six 
derivatives, all evaluated at point t, —1?,, 


the 
the 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY 2338 


at fers at, 

0Qo2.1(t1, te) = 0Qo3.1(t1, te) 


and describe them as intensities of risk. 
It follows from the definition of the probabilities Qi;, that 


Qin (ts =1. 
j=0 k=0 


By differentiating this identity with respect to ¢, at t, —t,, it is found 
that 


qoo = — Joi — 
q22 = — — 


It is easy to see that the intensities of risk go:, Yo2z, G2o and qos are 
necessarily non-negative numbers and that, therefore, the intensities qoo 
and gz are non-positive. For example, in order to establish that qo, => 0, 
we notice that for ¢, —¢, the probability Qo11(t:,4:) —0. Thus, if ¢, 
is increased, Qo: ..(t,, t2) cannot decrease and it follows that its derivative 
Yo: cannot be negative. 

Starting with the above definitions and postulates we will now deduce 
formulae for all the probabilities Qi,. We begin with formulae for 
probabilities relating to transfers between the two states S, and S:. 

Let OS Put k=0,1,2,--- and consider 
the probability Qoox(t:,t2-+7). This probability can be expressed in 
terms of probabilities of transfers between moments ¢, and ¢f, and 
between moments ¢, and t, +7 as follows: 


te + 7) — Qoox(ths ta) tas ta + 7) 
+E Qoou( th, ts) te +) 
+ Qoae(ths #2) ts + 1) 
+E te) te + 7). 


Subtract Qoox(t:,t2) from both sides of this equation, divide the 


ble 
es, j 
for 
fa 
ty 
es | 
‘is 
he | 
1 
er 
2 | | 
| 
} 
e | 
x 
| 
| 


234 E. FIX AND J. NEYMAN 


difference by + and pass to the limit as r-—>0. The hypotheses made 
imply that this limit exists and that, namely, 


Similarly, it is found 

By substituting successive values 0,1,2,- - - for k, we could determine 


in turn Qoo.0, Yo2.1, QYoo.1, etc. However, it is simpler to introduce prob- 
ability generating functions and thus to determine all probabilities in 
one stroke. Let 0=u=1 and write 


(7) G(u, te) = u*¥Qoon(tr, te), 
k=0 
(8) H(u, te) = 


It is seen that, if the two probability generating functions G and H 
are determined, then any of the probabilities discussed can be obtained 
simply by taking a derivative of a sufficiently high order, setting u —0 
and dividing the result by a factorial. Thus, 


1 OG 
(9) Qoo.n( ts, t,) = ral 
1 
for k—=0,1,2,---. Furthermore, the results of substituting u—1 


into both G and H are of particular interest. In fact, by inspecting 
(7) and (8), it is easy to see that 


G(1, te) Q* te) 


which is the probability defined in Section 3 of the main body of the 
paper that an individual who at moment ¢ =f, is in state S, will be 
found in this state at moment ¢ —?, (irrespective of whether or not 
he is transferred to S;, once or more, in between the moments ¢, and ?¢). 
Similarly, 

H(1, te) Q*oa(ts, te). 


In order to determine the two probability generating functions G and 
H, we multiply equation (5) by u* and sum for & from zero to infinity. 


ine 


ed 
0 


1 


1e 


ot 


' STOCHASTIC MODEL FOR FOLLOW-UP STUDY 235 


The result is 
= t,t 


Similarly, from (6) 


It follows from the definition of G and H that, at t, —t,, 
(13) G(u,t:)=—1, H(u,t,) =0. 
Upon applying the usual methods for solution of the system (11) and 
(12) with the boundary conditions (13), it is found that 
(v2(u) + — (v,(u) + 
v2(u) — 
e-¥i(u)t 
v2(u) — v,(u) 
where, for brevity, and 
v1(u) = — $( Yoo + q22) — $V (Yoo — q22)* + 4ugo2q20; 
v2(u) = — goo + + 4V (Goo — + 
It will be observed that both functions G and H depend on the 
duration of the time interval from ¢, to ¢,, that is, on t = t, —#,, but 
not on t;. Thus, it will be convenient to write henceforth G(u,¢) and 
H(u,t) for the former G(u,t,) and H(u,t,) respectively, with the 
understanding that the new argument ¢ stands for the former t, — t,. 
A similar convention will apply to notation resulting to probabilities 
Qux(ts, t2). Instead of this symbol we will write simply Qyx(t). Now 


we can proceed to the computation of the probabilities Qo:.(¢) and 
Qosx(t) for k= 1,2,---. The basic relations are: 


Qorn(t + Qorx(t) + Qoo.n-1 (4) 
+ Qoo.n-i(t) Qora(7) 


> 


(14) G(u, tz) = 


(15) H (u, tz) = qo2 


+E 
Vosn(t + T) Qos.n(t) + Qo2.%(t) Qes.1(7) 
+ 


+E 


| 
ob- 
in| 

| 
| | 
ye 
| 


236 E. FIX AND J. NEYMAN 


Following steps exactly similar to those above, we obtain 


om Jor1Qo0.k-1 (4, tz), 
2 


2 


Let 
ts) x (tr, te), 


be two functions generating probabilities Qo:, and Qos, respectively. 
It is seen that 
te) 


OJ (t,, t 
— (u, t, — ti), 
2 


qoiG (u, t,—t;), 


or, since for t, = ¢, 


T(u, t,) = J(u, t,) == (), 
te 
I(uste) = f ts) dt, 


ta 
J(u, t,) H(u, t, t,) dts. 


Using again the argument t = t, — ¢, instead of t,, the explicit formula 
for J and J can be written as follows: 


(16) I(u,t) 
+ q22)(1 — — yo(u)(v,(u) + J22)(1 
vi(U)v2(u)(v2(u) — vi(u)) 


Jo2Q23 


Formulae (14) through (17) represent the complete solution of the 
problem of probabilities of transfers from state S,. If we substitute 
u —1 and notice that A; = (1) for i— 1,2, we obtain formulae (1) 
for the probabilities Q*;(¢) given in Section 3. Also every probability 
of the type Qojx(t) may be obtained by the procedure in formulae (9) 


ly. 


\) 
) 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY ~ 23% 


and (10). In particular, it is interesting to find Qoo.o(t), Qor1(t) and 
Qo21(t). These probabilities are obtained by substituting u—0 into 
G, H and J. Easy algebra gives 


Qo0.0 ( t ) = 


Jor + Yo2 ( 


t) = 
Qoz.1(t) q20 + Y23 — Yor — Yo2 


Exactly similar reasoning applies to functions generating the prob- 
abilities Q.;x(¢) of transfers from state S, and they may easily be 
written by analogy. In particular, we have 


Qoo.o(t) 


Qos.1(t) Qe2.0(t)), 


Qooo(t) — 


(or + Yo2)(q20 + zs) + Y23 — Yor — Yo2 


As mentioned in the main part of the paper, by observing the 
empirical counterparts of these probabilities, one obtains data leading 
to estimation procedures that are somewhat simpler than those based 
on probabilities of the type Q*i;(¢). In order not to overload a single 
paper, the study of various possible schemes of observations is relegated 
to another publication. 


Qori(t) 


A3. RBAN ESTIMATES VS ASYMPTOTICALLY EFFICIENT ESTIMATES 


As mentioned in the main part of this paper, this section of the 
appendix is found necessary because of the apprehension of some statis- 
ticians as to the distinction between BAN estimates introduced by 
Neyman in 1945 and called here RBAN estimates and the long familiar 
estimates originally described by Fisher [13] as consistent and efficient 
but now, following Cramér [14], labeled consistent and asymptotically 
efficient. Another reason for considering the question is that in the 
wording of the original definition of Neyman, a certain amount of sloppi- 
ness was detected which it is useful to correct. While the following 
discussion may have a broader application, we shall specifically be con- 


| 
q 
le | | 
| 
he 
te 
q 


238 E. FIX AND J. NEYMAN 


cerned with the case studied in this paper where the observable random 
variables are the relative frequencies ¢;; defined in Section 6. 

The reworded Neyman definition of RBAN estimates is given in 
Section 6. For convenience of comparison, we repeat it here. 

Contemplating the estimation of a parameter ©, with its true value 
denoted by ®,°, we consider the class C; of estimates which we call 
regular asymptotically normal (RAN, for short). Every estimate of 
this class is characterized by the following three properties which serve 
as a definition of the class C;: 


(i) the RAN estimate is a function of the observable relative fre- 
quencies ¢4; but does not explicitly depend on the number N of absolutely 
independent trials from which the relative frequencies ¢i are obtained, 


(ii) the RAN estimate has continuous partial derivatives of the 
first order with respect to every $i, 


(iii) as the number N of observations is indefinitely increased, the 
RAN estimate converges in probability to the true value @,° of the 
estimated parameter. 


These three properties imply also a fourth: 


(iv) If 6y is a RAN estimate of ©, then, as the number N of 
observations is indefinitely increased, the distribution of the product 
(6y —@,°) VN tends to the normal with mean zero and with variance 
o* which is independent of NV. 


It has been proved by Neyman that, when each probability pj; has 
continuous partial derivatives up to the second order with respect to 
all parameters involved, then the class Cy, of RAN estimates always 
contains at least one estimate, say 6*y, such that the variance o** of 
the asymptotic normal distribution of (@*y—®,°) VN is less than 
(or at most equal to) that corresponding to any other RAN estimate. 
Every estimate 6*y of this kind is then called the regular best asymptotic 
estimate of ®, or RBAN, for short. 

Now this definition may be contrasted with the definition of 
consistent and efficient (in more modern terminology: asymptotically 
efficient) estimate as given by Fisher [13] and then repeated in many 
books. This definition may as well be quoted from the latest book by 
Mood [15]. On p. 150, Mood writes: 


a Efficient. In a great many estimation problems it is possible to construct 
estimates 0(2,,27.,- - -,#,), such that Vn(@ —®) has a normal distribution 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY 239 


with zero mean in the limit as the sample size m increases. Confining our 
attention to this class of estimates (and assuming such a class exists), there 
may be one or more estimators which will have a limiting variance which is 
smaller than the limiting variance of the other estimaters. These estimators 
which have the smallest limiting variance are called efficient estimators of 0.” 


In relation to the case considered in the present paper, this definition 
may be reworded as follows: When contemplating the estimation of a 

rameter @, with its true value @,°, we consider the class, say C, of 
functions of the ¢, having the above properties (iii) and (iv) but not 
necessarily (i) and (ii). Then, to be efficient, an estimator must possess 
properties (iii) and (iv) and, in addition, its asymptotic variance must 
be at most equal to that of any other function belonging to C. 

The reason Neyman introduced the term “ BAN estimate ” is that 
he was unable to prove that either the maximum likelihood estimates 
or some other estimates discussed in his paper satisfy the above definition 
of efficiency. In particular, he was unable to prove that the asymptotic 
variance of the maximum likelihood estimates is necessarily minimum 
within the whole class C. On the other hand, he was able to prove the 
minimum property with respect to the narrower class C; characterized 
by properties (i) and (ii) as well as by (iii) and (iv). Recently, 
Joseph L. Hodges, Jr. provided a number of examples in which the 
maximum likelihood estimates and, more generally, the RBAN estimates 
do not satisfy the definition of efficiency. Furthermore, these examples 
indicate that, at least in certain cases, asymptotically efficient estimates 
do not exist at all. 

Here is a simple example illustrating these points. Consider a single 
sequence of n completely independent trials, each with only two possible 
outcomes, “success” or “ failure.” Assume that the probability of 
success p is the same in all the trials and 0<p<1. The value of p 
is unknown and is to be estimated. The maximum likelihood estimate 
of p is, say 


a number of sucesses 


Its variance is o)*/n, say, with oo? p®(1— p®) where p® represents 
the true value of p. As is well known, p is asymptotically normal 
about p®. However, contrary to the most widespread beliefs, it is not 
asymptotically efficient. In order to see this, select a number a, 
0<a< 1, and define an alternative estimate, say p(a,n), as follows: 
For n= 16, 


p(, n) p. 


in | 
Jue | | 
call 
of | | 
ve q 
| 
ely q 
ed, | 
| | 
the q 
the 
the 
of | 
act 
1ce 
1as 
to | 7 
| 
te, 
tic | 
of 
ny | 
by 
ict 
on } 
| 
| | 


240 E. FIX AND J. NEYMAN 


For n> 16 and |¢—3| > -, 


p(a, =o=p. 
For n > 16 and 
n) 


It is seen that p(a,n) depends explicitly on n and, therefore, is not a 
regular asymptotically normal estimate of p. The reader will have no 
difficulty in verifying that the following two statements hold: 


(a) If p has the value p® ~1/2, then the probability that 
p(, n)Ag= Pp 


tends to zero as n—>0. This implies that p(a,n) is asymptotically 
normal about p® with asymptotic variance equal to that of p. 


(b) If p happens to be equal to p.) —1/2, then p(a,n) is again 
asymptotically normal about the true value of p, but its asymptotic 
variance has the value, say og”/n with 


(18) oa? = 2°po(1 — po) — = 


and this is less than the asymptotic variance of the maximum likelihood 
estimate p. It follows that, in the present case, the maximum likelihood 
estimate is not asymptotically efficient. Furthermore, it is easy to prove 
that in this example there is no efficient estimate at all. In fact, let p* 
denote any of the estimates belonging to class (’ and let o**(p°)/n be 
its asymptotic variance. It is easy to see that p* is not efficient. We 
notice first that o**(1/2) >0 because otherwise p* would not be 
asymptotically normal. Now choose 


a—o*(}) (say) 


and consider p(2’,n). If it happens that p? = 1/2, then, according to 
(18), the asymptotic variance of p(a’,n) is o**(1/2)/4n, smaller than 
that of p*, and it follows that p* is not an efficient estimate. Thus, 
the concept of RBAN estimates is essentially different from the concept 
of consistent and asymptotically efficient estimates. 


n 


ot 


STOCHASTIC MODEL FOR FOLLOW-UP STUDY 241 


REFERENCES 


[1] American Statistical Association jointly with the Biometric Society, “ Bio- 
metrics 2: Long time follow-up in morbidity studies.” Session of the 
annual meeting held in New York, New York, December 29, 1949. 

[2] A. H. Rosnert, T. E. Jones and J. B. Hazarp. Carcinoma of the breast, 
recurrence and survival in 203 patients. Cancer, Vol. 3 (1950), pp. 757-772. 

[3] S. W. Harrineton. Results of radical mastectomy in 5026 cases of car- 
cinoma of the breast, various clinical and pathologic factors which influence 
the prognosis. The Pennsylvania Medical Journal, January, 1940, pp. 1-15. 

[4] S. W. Harrtneton. Survival rates of radical mastectomy for unilateral and 
bilateral carcinoma of the breast. Surgery, Vol. 19 (1946), pp. 154-166. 

[5] S. W. Harrineton. Surgical treatment of carcinoma of the breast. The 
Journal of the Michigan State Medical Society, Vol. 47 (1948), pp. 41-50. 

[6] JosePpH BERKSON and R. P. Gace. Calculation of survival rates for cancer, 
Proceedings of the Staff Meetings of the Mayo Clinic, Vol. 25 (1950), pp. 
270-286. 

{7] J. NeyMaANn. First Course in Probability and Statistics. Holt, New York, 
1950. 

(8] Unirep Nations Wortp HEALTH ORGANIZATION, Expert Committee on 
Health Statistics, Subcommittee on the Registration of Cases of Cancer 
as well as their Statistical Presentation. Report on the first session of the 
subcommittee. (Mimeographed report, 21 April 1950). 

{[9] J. NeymMaNn. Contribution to the theory of the x? test. Proceedings of the 
Berkeley Symposium on Mathematical Statistics and Probability, Uni- 
versity of California Press, Berkeley, 1949, pp. 239-273. 

[10] JoserpH Berkson. Relative precision of minimum x? and maximum likeli- 
hood estimates of fegression coefficients. Proceedings of the Second 
Berkeley Symposium on Mathematical Statistics and Probability (In press). 

[11] Cutn Lone Cntanc. On the design of mass medical surveys. Human 
Biology, Vol. 23 (1951), pp. 242-271. 

[12] W. Fetter. On the theory of stochastic processes, with particular reference 
to applications. Proceedings of the Berkeley Symposium on Mathematical 
Statistics and Probability, University of California Press, Berkeley, 1949, 
pp. 403-432. 

[13] R. A. Fisuer. On the mathematical foundations of theoretical statistics. 
Philos. Trans. Roy. Soc., London, Ser. A, Vol. 222 (1922), pp. 309-368. 

{14] Hararp CramMér, Mathematical Methods of Statistics. Princeton University 
Press, Princeton, 1946. 

[15] A. M. Moon, Introduction to the Theory of Statistics. McGraw-Hill, New 
York, 1950. 


a | 

10 

ly | 

n 

ic 

} 

if 

| 

| 

{ 

ye 

| 

| 

= 

8, 

' q 

| 


ON THE DESIGN OF MASS MEDICAL SURVEYS"? 


BY CHIN LONG CHIANG 
University of California, Berkeley 


1. INTRODUCTION 


HE problem studied in this paper is that of the effectiveness of a 

mass health survey using a single medical test. The situation 
may be exemplified by the mass application of X-ray technique to detect 
those persons in the general population who are suffering from tuber- 
culosis. The problem would be simple if tests were in existence (i) 
which could detect the existing disease without fail and (ii) which 
would never suggest the presence of the disease when in actual fact it 
is absent. Unfortunately no such tests are known and every available 
test is subject to error. Thus, a mass application of a test is bound to 
give a certain number of “ false negatives” and also a certain number 
of “false positives.” Therefore, when a mass health survey using a 
given test is organized, it is essential to inquire about the proportion 
of diseased persons in the population who will be detected by the test. 
Ordinarily, a given test may be applied at different levels of intensity. 
For example, an X-ray diagnosis for tuberculosis may be based on just 
one X-ray photograph, or on two, three, etc. If it is found that the 
proportion of the diseased likely to be detected by the test applied at 
the contemplated level of intensity is meager, then it may be appropriate 
to increase the intensity of the test. 

In addition to the paramount question about the proportion of the 
diseased that will be detected by the test there are several other questions, 
perhaps of lesser weight but still very important, that need to be con- 
sidered. One of these is concerned with the proportion of persons in 
the population who suffer from the given disease. Although a mass 


1 This work was sponsored in part by the Office of Naval Research. 

? Parts of this paper were presented at the Berkeley meeting of the Institute 
of Mathematical Statistics, June 24, 1948, under the title “ Power of certain tests 
relating to medical diagnosis,” by C. L. Chiang and J. L. Hodges, Jr. 


DESIGN OF MASS MEDICAL SURVEYS 243 


survey will not identify all of them, it appears that the results of the 
survey may yield an estimate of the proportion that these persons bear 
to the total population. Another interesting question is the proportion 
of well persons that the mass survey will alarm unnecessarily by classi- 
fying them as positives. 

Naturally, these and similar questions cannot be answered without 
empirical data. ‘The empirical data contemplated here consist of the 
results of a preliminary sample survey using the medical test intended 
for mass application. We shall, then, assume that from the general 
population a reasonably large sample of N individuals is (or will be) 
drawn. We shall further assume that each of these individuals will be 
subjected to the test under consideration repeated n times, with the 
utmost care, to see that these applications of the test are mutually 
independent. For example, in the case of the X-ray test, independence 
may be assured by taking n separate photographs of an individual, 
readjusting the whole equipment each time, and then sending a con- 
siderable number of films, taken from many individuals and arranged 
in a random order, to the radiologist at one time. 

As a result, to each individual in the sample there will correspond 
a number & of those tests which gave the result “ positive.” Obviously, 
k may assume any value from zero to n, k= 0,1,2,---,n. After the 
tests on all the N individuals are completed, each individual will be 
classified according to his value of k and the letter N; will denote the 
number of individuals for whom exactly & tests out of the n tests gave 


the outcome “ positive.” Naturally, V = > N;,. The distribution pro- 
k=0 


vided by the numbers No, Nn will, then, be a summary of 
empirical results on which to base the study of the effectiveness of the 
proposed mass survey. The main problem considered is that of deter- 
mining the number n* of independent tests to be applied to each indi- 
vidual in the proposed mass survey so that the proportion of diseased 
identified will be approximately equal to a preassigned number ¢, say 
e== 

Any mathematical study of actual phenomena must begin by building 
up a mathematical model of these phenomena. The validity of the 
conclusions drawn from such a study depends on the precision with 
which the mathematical model used corresponds to the phenomena 
studied. Thus, in the further sections of the paper we will consider 
several alternative models of the variability of the response of particular 


| | 
fa 
ion 
| 
(i) | 
ich 
ble 
to 
er 
on 
at. 
ust 
he | 
at | 
ate 
he 
ns, 
n- 
in 
| 
| 
ite 
sts | 
ia 


244 CHIN LONG CHIANG 


individuals to the given test. These models differ in certain respects 
but have in common an important element. This is that it is postulated 
that to every individual in the population there corresponds a fixed 
number p which represents the probability that the medical test under 
consideration applied once to this individual will give the result 
“ positive.” Thus, if the test is independently applied n times to the 
same individual, the probability that there will be exactly k positive 
outcomes is, say 
n! 


(1) 


Our further general assumption will be that the value of the probability 
p depends on the intensity of the disease. Thus, we will assume that 
the persons entirely free of the disease in question have a value of p 
which is smaller than that corresponding to the persons affected by the 
disease. Furthermore, if two persons A and B both suffer from the 
disease and A is more seriously affected than B, then the value of p 
corresponding to A will exceed that corresponding to B. 

The above assumptions are common to all the models considered 
below. These models differ from one another by the particular assump- 
tions made regarding the distribution of p from one individual to 
another within the population studied. It must be realized that any 
such assumption that can be made will most probably be false. The 
problem is to find assumptions that are good enough for practical 
purposes. 

The first study of the kind outlined, known to the author, is that 
of Hugo Muench [1].* The present paper follows the lines of the study 
of J. Neyman [2] which are also explained in his recent book [3]. The 
mathematical methods of estimation and of testing hypotheses used here 
are also those of Neyman [4]. The numerical examples discussed below 
are based on data regarding the screening for tuberculosis obtained by 
Drs. Birkelo, Chamberlain, Phelps, Schools, Zacks and Yerushalmy [5]. 

I should like to express my deepest gratitude to Professor J. Neyman 
for assigning this problem to me and for his kind assistance in drafting 
this paper. I am also indebted to Dr. Elizabeth L. Scott and Professor 
Joseph L. Hodges, Jr. for their constant help and valuable suggestions. 


* Numbers in square brackets refer to references at the end of this paper. 


| 


DESIGN OF MASS MEDICAL SURVEYS 245 


2. FIVE MODELS OF POPULATION DISTRIBUTION 


The models which we are going to outline in this section concern 
the hypothetical distributions of individuals in a population with respect 
to the intensity of illness of a given disease. In the preceding section 
we introduced the probability p of detecting the evidence of disease in 
a single test, and assumed that the value of this probability corresponds 
to the intensity of illness. Therefore, our models are characterized by 
the postulated distribution of the population according to the value of 


this probability p. 


Model I. The first model, which will be referred to as Model I, is 
discussed in detail in [2]. Under this model, the population is postu- 


1> | 
[ 
3 
=z = 
2 
- - 
$ 
f if f, f, if 
HEALTHY VERY SICK WEALTHY VERY sick 
Fic. 1. Moper I 1n FREQUENCY Fic. 2. Mopet I 1n CUMULATIVE 
DISTRIBUTION FoRM DISTRIBUTION FoRM 


lated to consist of three mutually exclusive categories of individuals: 
a proportion 7, of individuals “entirely free” from the given disease, 


with the same constant probability p=, (say) of positive reading in 
a single diagnostic test; a proportion x, of “ moderately affected ” indi- 
viduals with the probability p= p.; and a proportion 7; = 1— a, — m 
of “ very sick ” individuals whose sickness is certain to be detected. The 
actual values of these parameters, 7,, 72, 73, p: and p. are unknown 
and are to be determined from the observations. However, we keep in 
mind the idea that p, is a small number and that the value of p. must 
lie somewhere between p, and unity. Graphic representations of the 
model in frequency distribution form and in cumulative distribution 
form are shown in Fig. 1 and Fig. 2, respectively. 


| 
“ts | 
ed | 
ed | | 
er | 
lt | 
he | 
ve | 
| 
at | | | 
he 
he | 
Pp 
od 
| 
to | | 
ly | 
ie | 
al | 
it 
ly 
le | 
"e 
W 
|. 
n 
g | | 
| 
3. | | 
| 
i 


246 CHIN LONG CHIANG 


Model II. Model I, as Professor Neyman pointed out, is an over- 
simplification of the true situation. In fact, he made the following 
remarks [2, p. 1450] about the desirability of a graduation of the degree 
of moderate sickness: 


“. . . In reality we may expect that the subdivision of human populations 
is much finer than is postulated here [i.e., Model I] and that the category 
of ‘moderately affected’ splits itself into a continuous graduation of the 
intensity of illness, from very slight to very heavy, with the probabilities of 
illness being detected increasing gradually from 1 — @ [in the notation of the 
present paper, 1 — is p,] to unity.” 


1 1 

2 > 
é 
- 

a a 
a 

a a 


if 


HEALTHY VERY SICK HEALTHY VERY SICK 
Fie. 3. Mopet II FREQUENCY Fic. 4. II 1n CUMULATIVE 
DISTRIBUTION FoRM DISTRIBUTION FoRM 


A model of this kind was constructed by assuming an even spread 
of the moderately sick group. Figs. 3 and 4 indicate the pattern of this 
model. Notice that although we now postulate the existence of an 
entire range of the intensity of illness we restrict Model II by postu- 
lating that the proportion of individuals at each intensity of moderately 
sick is the same. 


Model II(a). In Model II, x; represents the proportion of individuals 
in the population whose probability of being declared “ positive ” is unity. 
These are the individuals who have been labeled “ very sick.” A hypo- 
thesis may be formulated that there are no such persons, that is, r,; = 0. 
The resulting model is labeled Model II(a). The graphic description 
of the model is similar to Figs. 3 and 4 except that 7, + 7, —1. 


Model III. In the previous models, for simplicity, we have made the 
assumption that human populations are divisible into separate categories. 


| 


DESIGN OF MASS MEDICAL SURVEYS 247 


This assumption, in certain instances, may not be in accord with the 
real phenomena. It seems more desirable to consider the whole popu- 
lation as a single group and look for a flexible function to describe the 
distribution of the people in a population so that there is enough room 
for the model to vary according to whatever the machinery that produces 
the observations may happen to be. At the suggestion of Professor 
Neyman, a Pearson Type I curve, known as the incomplete beta func- 
tion, was investigated. The result leads to the formation of a new 
model to be labeled Model III. Under this model, the whole population 
distribution is assumed to follow some incomplete beta function. 


PROBABILITY OENSITY OF f 


1 P i f 
WEALTHY VERY SICK HEALTHY VERY SICK 


Fic. 5. III 1n FREQUENCY Fic. 6. Mopen III 1n CUMULATIVE 
DISTRIBUTION FORM DISTRIBUTION ForRM 


The incomplete beta function involves two unknown constants, m, 
and m, (say). The shape of the curve changes considerably as the 
constants change, so that a graphic description of the model is not avail- 
able until it is applied to the observations. However, as fitted to the 
data in the next section, the model has the forms shown in Figs. 5 and 6. 

If in some practical cases there is evidence of the existence of either 
a separate group of “entirely healthy” persons or of “very sick” 
persons, or both, in human populations, we may consider modified forms 
of this model by singling out either one or both of the extreme categories. 
If such cases do happen, we suggest the hypotheses that the probability 
is zero that an “entirely healthy ” person is declared to be “ positive ” 
and the probability is unity for a correct positive diagnosis on a “ verv 
sick ” patient. 


T- 
1g 
ee 
ry 
he 
of 
he | 
i 

| “, | 

| 

| 

| 

| 

5 | 
| 

| 

| 


248 CHIN LONG CHIANG 


Model IV. A simple model for computation, and one which imposes 
little restraint on the probabilities, is to permit masses of probability 
at equally spaced points. In fact, we may divide the whole interval of 
p from zero to unity into ¢ (say) equal segments, with ¢t < n, and let 
the probability p assume the values j/t, for 7 =0,1,---,¢. These, 
then, are the only values that we allow for p, but we impose no restric- 
tions on the proportion of individuals who are affected at any particular 
intensity of the disease. Model IV is the converse of Model II in this 
respect, since Model II allows p to take any value between p; and unity 
but restricts the proportion at each intensity of moderately sick to be the 
same. 

The unknown constants involved in Model IV are the corresponding 
proportions, z; (say), of individuals with the probabilities pj = j/t. We 


1 17 
2 2 
> 
2 z 

‘7 
- 
a a 


HEALTHY VERY SICK WEALTHY VERY SICK 
Fie. 7. Mopet IV 1n FREQUENCY Fic. 8. Mopert IV In CUMULATIVE 
DISTRIBUTION FoRM DISTRIBUTION ForM 


may regard the first group of individuals with proportion x, and prob- 
ability po = 0/t —0 as the category of “entirely healthy ” people, the 
last group with proportion 7 and the probability pp—t/t—1 as the 
“very sick” category and the intermediate groups as categories of 
“moderately sick” individuals varying from slightly affected to very 
heavily affected. 

For example, with n= 5 readings on the photographs taken from 
each individual, there are sufficient degrees of freedom to allow prob- 
abilities at po = 0, p: = 1/4, p2 = 1/2, ps = 3/4 and p, —1. The corre- 
sponding proportions: 7, 7, m2, 7; and = 1 — — — m2 — are 
to be determined from the data. This model is shown in Fig. 7 and 
Fig. 8. 


| 
| 
| 


DESIGN OF MASS MEDICAL SURVEYS 249 


3. OBSERVED DATA USED FOR ANALYSIS 


The material for the present study is from the data collected under 
the direction of Professor J. Yerushalmy [5]. His principle object 
was a comparison of the relative effectiveness of various X-ray techniques 
for detecting the evidence of tuberculosis. In the setup, a sample of 
1256 persons was X-rayed on four X-ray photographs by different 
techniques. The four sets of photographs were interpreted independently 
by five radiologists and chest specialists with a common standard of 
interpretation. 

For the purpose of the present study, we limit the material to the 
five independent readings on the whole set of celluloid films.* The result 
of these readings leads to the frequency distribution shown in Table 1. 


TABLE 1 


Observed frequency distribution of readings of N = 1256 photographs, 
each by n = 5 independent readers 


NUMBER OF POSITIVE READINGS, k NUMBER OF PERSONS 
80 DECLARED 
0 1125 
1 47 
2 23 
3 17 
4 17 
5 27 
Total 1256 


When analyzing the data, we again make the assumption that a 
constant probability of a positive reading is attached to each person 
throughout all the five interpretations. This assumption implies that 
the five specialists are considered to be equally proficient. 


4. COMPARISON BETWEEN EXPECTED AND OBSERVED FREQUENCIES 


The five hypothetical models are fitted to the data quoted in the 
preceding section. The results appear in Table 2. 


‘The X-ray techniques are (1) 35-mm. photoflurogram of the chest, (2) 
4” X 10” stereophotoflurogram, (3) roentgenogram on 14” X17” paper negative 
and (4) 14”X17” celluloid film. The celluloid films were circularized twice 
among the readers for interpretations. We adopt the first interpretations for 
our analysis. 


4 

| 

| 

y 

of 

at | 

e, 

ir | 

is 

y 

ie 

| 

i 

e | 

| 

| 

| | 

i 

iq 

. 

| | 

i i 

? | 

| 


250 CHIN LONG CHIANG 


TABLE 2 


Observed and expected number of positive readings on N = 1256 X-ray 
photographs, each by 5 independent readers 


NUMBER OF 
POSITIVE 
READINGS OBSERVED EXPECTED FREQUENCIES, UNDER MODELS 
k FREQUENCIES I II II (a) III IV 
0 1125 1129.7) 1125.5 1127.1 1122.6 1127.7 
1 47 47.6 47.3 47.4 41.1 38.9 
2 23 18.4 19.0 20.6 24.1 28.7 
3 17 20.6 18.7 20.3 19.4 18.0 
4 17 11.8 18.7 20.3 19.2 15.6 
5 27 27.9 26.8 20.3 29.6 27.1 
Total 1256 1256.0 1256.0 1256.0 1256.0 1256.0 
Xe? 2.844 1.043 3.211 1.690 3.004 
Number of d. f. 1 2 3 3 1 
Pr(x? = x0") .092 595 363 641 .083 
F,=931 #,==.905 7,=.903 *,=.879 
#,=.049 %,=.089 #,=.097 
#,=.020 7,=.006 7,=.000 
P:=.007 p,=.0051 p,=.0049 m,——.9669 #,—.029 
te Px 1 1 1 


The expected frequencies were calculated from the BAN estimates 
which determine the exact shape of the respective models. The values of 
xo” show the discrepancies between the expected and observed frequencies, 
while the figures in the row of “ Pr(x* = xo”)” tell us more precisely 
the goodness of fit of the various models to the true data. For example, 
under Model I, the reader will find the number .092. This figure means 
that if Model I describes the true population, the probability that the 
value of x? will exceed 2.844 due to chance only is .092. Using a = .05 
as the level of significance, we find no reason for rejecting Model I. 

Although the values of Pr(x? = xo”) vary from .083 to .641, which 
may indicate a preference among the models for this particular data, 
yet, with the five per cent level of significance, all the models appear 
acceptable and one is tempted to presume that, whichever of them is 
adopted for the evaluation of the success of a mass survey for tuberculosis 
using the X-ray technique, the results of such evaluation will be essen- 


DESIGN OF MASS MEDICAL SURVEYS 251 


tially the same. If such be the case, then the choice of the model to 
use would be based on the relative simplicity and ease of computations. 
In the next section we study the question of the consequences to be 
expected from adopting a false model. 


5. IMPORTANCE OF DISTINGUISHING ONE MODEL FROM ANOTHER 


The following device will show us the undesirable consequences of 
ignoring the difference between models. Suppose that we are considering 
a mass survey of a population for the purpose of detecting those persons 
who are tubercular. Realizing, from Professor Yerushalmy’s work [5, 
6], that for many sick persons there is a chance that a single diagnostic 
test will fail to detect the illness, we plan to have each person examined 
a number n* times and passed only if all the outcomes are negative. 
The question is: how large should n* be? The “ very sick” persons, for 
whom the probability of detecting the evidence of disease is at or near 
unity, are almost certain to be picked out even with n* 1. Therefore, 
we may consider only the “ moderately sick ” persons. 

Suppose that we wish our survey to detect at least a specified pro- 
portion, say «, of the moderately sick. If we adopt a specific model of 
the distribution of p and assume that the constants involved are satis- 
factorily estimated from a preliminary survey, then the requisite value 
of n* is easily found. Assume, for example, that Model I with the 
constants given in Table 2 satisfactorily represents the distribution of p 
among a population screened for tuberculosis and that a person will be 
classified as “ suspect” if at least one of the n* independent diagnoses 
produces the result “ positive.’ Then every “ moderately sick ” person 
has the same probability of being detected in one diagnostic test, namely, 
po = .535. With n* —1, in the long run, about 53.5 per cent of the 
“moderately sick” will be picked out. With n* — 2, this proportion 
will be 1 —(.465)* = .784 or 78.4 per cent of the group. For a pre- 
assigned ¢, the proper n* can be obtained by solving the inequality 


«> 1—(.465)"" 
for n*. 
A similar, although a little less simple, procedure is applicable to 
the other models considered, provided we make a precise definition of 


“moderately sick.” In Models II and II(a) we treat as “ moderately 
sick ” all those persons for whom p is greater than p; but less than unity. 


| | 

| 

| | 

- 

i 

| 

- 

| 

| 

| 

| q 

| 

| 


252 CHIN LONG CHIANG 


In Model IV the “ moderately sick” are all those persons for whom 
0<p<1. With these conventions, the left-hand side of Table 3 gives 
the values of n* required to detect at least the proportion «=—.9 of 
“moderately sick” persons in populations in which the distribution 
of p follows the model considered with constants given in Table 2. The 
right-hand side of Table 3 gives the long run proportions of the “ mod- 
erately sick ” which would be detected with each particular value of n*. 


TABLE 3 


Importance of distinguishing between models 


Number n* of diagnostic tests 
required to detect at least 90 
per cent of moderately sick 


Proportion of moderately 
sick persons detected, in 
relation to model and n* 


persons 
MODEL n* n* MODEL 
I II II(a) IV 
I 4 4 953 804 804 .769 
II 9 7 995 
II (a) 9 9 999 905 904 
IV 


Table 3 illustrates the following situation: Assume that Yerushalmy’s 
data are taken as a basis for planning a large scale X-ray survey intended 
to detect at least ninety per cent of all persons moderately affected by 
tuberculosis. The question asked is: how many independent X-ray 
examinations should be made on each person screened? The answers 
are given in the left-hand side of Table 3 and will be seen to depend 
considerably on the particular model of the distribution of p postulated. 
If we postulate Model I, then n* = 4 independent screenings will be 
sufficient. However, Model I may be very inaccurate and the actual 
distribution of p may conform more closely to Model II or II(a). If 
this is the case, then, to detect at least ninety per cent of the “ moderately 
sick ” it will be necessary to screen each person at least nine times! 
What would happen if only n* —4 independent screenings were made 
is indicated in the first line in the right-hand side of Table 3. If the 
true distribution of p conforms to Model I, the proportion of “ moderately 
sick ” detected will be 95.3 per cent; if the distribution of p conforms 
to Model II, this proportion will be 80.4 per cent, ete. 

The conclusions suggested by Table 3 are as follows: When a large 


DESIGN OF MASS MEDICAL SURVEYS 253 


scale survey intended to control tuberculosis is planned to detect some 
90 per cent of the moderately affected, a project which detects only 
77 per cent (Table 3, n* = 4, Model IV) may conceivably be considered 
as only mildly disappointing. If such be the case, then Table 3 suggests 
that, while a more precise knowledge of the true distribution of p is 
desirable, it is not essential. On the other hand, it is conceivable that, 
at least in some cases, the difference between the hoped for ninety per 
cent of moderately sick detected and the possible seventy-seven per cent 
will be considered important. In cases of this kind efforts must be made 
to establish the form of the distribution of p more precisely than appears 
possible using the data reproduced in Table 1. 

Quite apart from the above practical conclusion, it appears to be 
interesting to inquire why it is that five different models which imply 
substantially different values of n* appear to fit the same data about 
equally satisfactorily? One obvious answer is that, although the volume 
of Professor Yerushalmy’s data (N — 1256) is substantial, it is still 
insufficient. Another, a little less obvious, suggestion is that the difficulty 
in distinguishing between the models considered is connected with the 
relatively small number n = 5 of independent X-ray examinations under- 
lying the data in Table 1. It is possible that, if the preliminary survey 
were based on, say, nm = 8 independent tests applied to each individual 
in the sample, then the results would indicate more clearly the true 
nature of the distribution of p. These problems are discussed in the 
next section. 


6. PROBLEMS OF POWER AND OF DESIGN OF A PRELIMINARY SURVEY 


The answers to the two questions mentioned in the closing paragraph 
of Section 5 are connected with the concept of power of a test introduced 
by Neyman and Pearson. We consider a statistical hypothesis H, for 
example, that the distribution of p corresponds to Model I with unspeci- 
fied values of the constants. We are going to obtain a sample of 
N = 1256 individuals and use the sample in a prescribed way to compute 
x’. We made a rule that, if the computed value of x? exceeds a certain 
limit xq? (the tabled value adjusted to the level of significance a and 
to the number of degrees of freedom), then we shall reject H. In so 
doing we are certain that, should it happen that H is true, the prob- 
ability of an unjust rejection of H is approximately equal to a. This 
is the essence of what we call a test of the statistical hypothesis H. It 


n 
le | | 
- | 
* | 
f } 

| | 

| 

| 
| 
d 
L 
il 
y 1 
! 
e 1 
e 
y | 
| 
| 

| 

| 


254 CHIN LONG CHIANG 


is clear that this hypothesis may be false and yet the criterion x? may 
fail to exceed xq”, in which case H will not be rejected. 

Let H’ denote a simple statistical hypothesis, alternative to the 
hypothesis tested 7. The power of a test of the hypothesis H against 
the alternative H’ is defined as the probability, computed on the assump- 
tion that H’ is true (and thus that H is false), that the test under 
consideration will reject H. If, in a particular case, the power B is a 
small number, then the failure to reject H when H’ is true should not 
be surprising. 

The power of the x’ procedure applied above to test the adequacy 
of the five different models of the distribution of p depends in a com- 
plicated way on the values of the various parameters involved in each 
model. The approximate power may be computed using a procedure 
taken from Neyman’s lectures*® and the tables of the non-central y? 
computed by Evelyn Fix [7]. 

Table 4 gives the approximate power of the test of the hypothesis 
that the distribution of p corresponds to Model I when the true model 
is Model II and when it is Model III. It also gives the power of the 
test of the hypothesis that the distribution of p corresponds to Model IT 
when the true model is Model I. In all cases it is assumed that the 
number of observations is that of Dr. Yerushalmy, NV — 1256, and the 
power is computed for two different levels of significance, a= .01 and 


a = .05, and the constants relating to the true model are those given in 
Table 2. 
TABLE 4 


Power of test of a model against an alternative 


MODEL TRUE LEVEL OF 
TESTED MODEL SIGNIFICANCE, a POWER OF TEST 8 


I Il Ol .09 
05 24 

I Ill Ol 20 
05 41 

II I 01 12 
05 29° 


* The asymptotic power of the x? test for a simple hypothesis was developed 


by C. Eisenhart [8,9]. In a paper by A. Wald [10], a more general case has 
been discussed. 


DESIGN OF MASS MEDICAL SURVEYS 255 4 


ay It is seen that in all cases the power is rather small, always less than 
one-half. Therefore, it is not surprising that the application of the x? 


Me test failed to reject any of the five models considered. { 
| 
ler | TABLE 5 
: Number of individuals required to be X-rayed for various values of the power ; 
ot of test of a model against an alternative, for n= 5 
cy LEVEL POWER OF TEST, 

MODEL TRUE OF 
“ TESTED MODEL sIG.a_ 2 3 A 5 6 8 9 
ch 01 1353 2430 3400 4358 5360 6466 7764 9436 12020 
Te I 05 1663 2352 3103 3958 4986 6341 8490 
2 
x i 01 694 1246 1743 2234 2748 3315 3980 4837 6161 i 
a I 05 8531206 1591 2029 2556 3250 4352 1 
el II I 01 1071 1835 2501 3146 3812 4539 5384 6461 8111 
' 05 291 806 1293 1784 2308 2892 3585 4485 5890 
e 
IT 
" Table 5 refers to the tests of the same Models I and II contrasted 
i with the same alternatives as in Table 4 and answers the question of q 
id how large the number N of observations in the preliminary survey should i 
in be, always using the same number n = 5 tests per individual, so that 1 


the power 8 has preassigned value from .1 to .9 given in the title line 

| of the table. It is seen that, in order to have a reasonable chance, say 

B =.8, of rejecting Model I when the true distribution of p follows 
aa | either Model IT or Model III, one needs a multiple of the number of 
observations available to Dr. Yerushalmy. This applies especially to 

the case where p follows the distribution of Model IT. r 

The practical conclusion drawn from Table 5 is that, if in the pre- { 

liminary survey one insists on subjecting each individual to n=—5 | 

independent tests, then, in order to obtain data which could give reason- 

ably accurate information regarding the distribution of p, the size NV 

of the sample in the preliminary survey should be of the order of 7,000 

or more. 

Now we may turn to the other question raised in Section 5, that 

‘d regarding the influence of the number n of independent tests applied to 4 

1s each individual of the preliminary survey. | 


| 


256 CHIN LONG CHIANG 


TABLE 6 


Power of test of Model I against Model III for different values of n 
and two levels of significance 


LEVEL OF SIQ. n=5 n=6 n=8 
a= .01 54 >.95 
a= .05 4l 76 >.95 


Table 6 gives the power of the test of Model I corresponding to 
the assumption that the true distribution of p follows Model III with 
the constants of Table 2, while the number of individuals tested is that 
of Yerushalmy, VN — 1256. It is seen that with the increase of n the 
power of the test increases very rapidly. In fact, if each of the 1256 
persons screened were subjected to n = 8 independent tests, one could 
take for granted that Model I could be distinguished from Model III. 
The same circumstance is illustrated in Table 7, somewhat analogous to 
Table 5, giving the values of N with which the power of the test of 
Model I against Model III with constants of Table 2, has the values 
- -,.9. It is seen that with n= 8 a sample of 1000 indi- 
viduals would be ample to reject Model I when the true model is 
Model ITI. 


TABLE 7 


Number of individuals required to be X-rayed for various values of the power 
of test of Model I against Model III for n= 5, 6 and 8 
and two significance levels 


POWER OF TEST 8 
Jl 2 3 4 5 6 7 8 9 


a=.0l n=5 694 1246 1743 2234 2748 3315 3980 4837 6161 
n=6 330 «6565 «1173 1396 1656 1987 2495 
n=8 156 257 343 424 507 596 699 829 1026 


a=.05 n=5 177s «45515: 853-1206 «2556 3250 4352 
n=6 90 248 398 549 710 890 1103 1379 1812 
n=8 46 119 185 250 318 393 480 591 763 


The practical conclusion drawn from the above results is that when 
planning a preliminary survey designed to estimate the distribution of 
p in the population studied and to determine the number, say n*, of 
independent tests to be applied to each individual in the subsequent 


| 


DESIGN OF MASS MEDICAL SURVEYS 257 


mass survey, it is essential to see that the number n of independent 
tests applied to each individual of the preliminary survey be large 
enough to reflect the properties of the distribution of p. Specifically, 
it appears that n = 8 would be ample. Assuming n = 8 and N = 1000, 
the radiologists would have to make nN = 8000 readings. On the other 
hand, with n = 5, in order to have a similar chance of rejecting Model I 
when the true model is Model III, one would have to have about 
N = 6000 and the total number of readings would be 30,000. 

Of course, the models studied do not exhaust all the possibilities, but 
it is expected that the figures obtained are applicable to a broad variety 
of possible situations. 

The effect on the power of the increase in the value of n may seem 
unexpected. The intuitive reason for this increase is illustrated in 
Fig. 9. The upper part of Fig. 9 refers to the case n—5 and the 


; 
MODEL I MODEL 


Fic. 9. ExpecTeD FREQUENCIES UNDER Mopet I anp Mopet III 
FOR n= 5 AND n=8 


| 

| 
to 
th 
at 1 

he 

6 
la | 
to | 
of | 
es 
li- 
is | 
er 
q 
q 
1 
n 
it | 
| 


258 CHIN LONG CHIANG 


lower part to the case n= 8. In each case, the height of any rectangle 
drawn represents the probability p, of exactly k “ positive ” outcomes 
out of the n tests applied to an individual randomly selected from the 
population. The left-hand side of the figure was constructed assuming 
that the aio ribution of p in the population follows Model I with con- 
stants given in Table 2. The right-hand side of the figure is computed 
assuming that the population distribution of p corresponds to Model III, 
again with constants exhibited in Table 2. It will be seen that, although 
there is always a distinction between the values of p, implied by the 
two models, in the case n = 8 this distinction is much stronger than 
in the case n = 5. 


7. CONCLUSIONS 


1. The paper deals with the situation where each individual of a popu- 
lation is going to be subjected to n* replications of a medical test 
designed to identify those individuals who are affected by a specified 
disease. 

2. When planning a survey of this kind it is essential to see that n* is 
large enough to insure the detection of a substantial proportion of 
the individuals actually affected by the disease. 

3. The value of n* insuring that the proportion of individuals actually 
affected by the disease identified in the survey equals (approximately ) 
a prescribed number ¢, depends on the distribution within the popu- 
lation of the probability p that a single application of the test will 
give the result “ positive.” 


4. In order to estimate this distribution, a preliminary survey is neces- 
sary. In the preliminary survey a random sample of N individuals 
of the population should be screened for the disease with n indepen- 
dent replications of the same test. 

5. In order to achieve a reasonably precise estimate of the distribution 
of p, the number n of replications of the test and also the number N 
of individuals in the preliminary survey should be sufficiently large. 
It is shown that, in the conditions of the survey for tuberculosis as 
reflected in the data collected by Dr. Yerushalmy, the number n = 5 
and the number NV = 1256 of individuals in the sample are too small 
to give a sufficiently accurate estimate of the distribution of p. On 
the other hand, it appears that n—8 and N 1000 would be 
sufficient. 


ww 


3 
) 


DESIGN OF MASS MEDICAL SURVEYS 259 


7. APPENDIX 


In this appendix we assemble a few formulae used in the main text 
of the paper. 


(A) Notation and basic formula 


n —=number of independent applications of the same diagnostic 
test 7 to an individual, 

k number of “ positive” outcomes in n independent tests T 
applied to the same individual, 

N = number of individuals in the sample in the preliminary survey, 

N;, = number of those individuals in the sample for whom, out of 
the n replications of test 7, exactly k gave the answer 
positive,” 

qi: = N,./N, the observed relative frequency of exactly k positive 
outcomes per individual, 

p= probability that a single application of test T to a given 
individual will give the outcome “ positive.” It is postulated 
that this probability p characterizes the state of health of the 
individual, that it varies from one individual in the popu- 
lation to another, but that for any given individual it stays 
constant over all the n replications of the test 7. 

Px = idealization of the relative frequency gq, — the probability that 
n replications of the test T to an individual selected at random 
from the population will give exactly k “ positive ” outcomes, 
and 


G(p) =cumulative distribution of p in the population considered. 


The basic formula underlying all the results of the paper is that 
connecting the probability p, with the distribution G(p). For a given 
individual with a fixed value of p, the probability that n independent 
replications of test T will give exactly & positive results is given by the 
familiar binomial formula 


(1) (1 p)"™. 


If the individual considered is selected at random from a population 
in which the distribution of p is given by the function G(p), then the 


| 
| 
3 
> 
l 
| 
| 
) 
| 
| | 
| 
. 
| 
| 
| | 
| 
| 
| 
| 
| 


260 CHIN LONG CHIANG 


probability p, of obtaining exactly k positive outcomes of the test is, 
simply, the average of (1) with weights determined by G(p), 


(2) Pum Ont 


Certain models postulate that the distribution G(p) is discontinuous 
so that p can assume only a finite set of different values 0 =p; < pp 
Then for each j = 1,2,- - -,?¢, the distribution func- 
tion G(p) specifies the proportion z; of the individuals in the population 
considered for which p = p;, 


P{p = pj}. 
In this case the integral in (2) reduces to the sum 


t 
2 (1 — py)"*. 


In the main body of the paper, formula (2) is used in two different 
ways. First it is applied to deduce the expressions for the probabilities 
Px in terms of the several parameters which are inherent in the particular 
models considered and have unspecified values. These parameters must 
then be estimated from numbers gq; obtained from the preliminary 
survey. The second way in which formula (2) is used refers to the 
probability that an individual of the population moderately affected by 
the disease will be “identified” in the mass survey as a result of n* 
replications of the test 7. In Section 5 it was agreed that at least one 
“ positive ” outcome out of the n* tests will classify the individual as a 
suspect. More general, there will be a number k, = 1 such that, if out 
of the n* tests ky (or more) give positive results, the individual will be 
classified as suspect. Denote by P* the probability that an individual 
moderately affected by the disease will be classified as a suspect, when 
the test 7 is applied n* times. Further, let the two values R, and 
R. > R, serve to define a “ moderately sick” person. That is, a mod- 
erately sick person is one for whom Fk; < p=). It is easy to see that 


Ro 
—p)"*dG (0) 


Once all the parameters in @(p) are satisfactorily estimated, formula 
(3) is easy to evaluate. Whenever necessary, a similiar formula can be 
computed representing the probability that a healthy person will be 


classified as a suspect. 


DESIGN OF MASS MEDICAL SURVEYS 261 


(B) Estimation of parameters in the distribution G(p) 


It is unfortunate that no procedure with optimum properties proved 
for samples of fixed size is now available for the estimation of para- 
meters involved in the distribution G(p). Among the procedures with 
the optimal asymptotic properties, the simplest to apply appears to be 
Neyman’s modified minimum ,’ procedure with reduced conditions [4]. 
The machinery of its application is as follows: 

Assume that the distribution G(p) depends upon n—s parameters 
6;, With unknown values. Formula (2) expresses py, as a 
function of the same parameters, say 


(4) Pu = One) k == 0,1,2,°--,n. 
Obviously these functions must satisfy the identity 


so that out of the n+ 1 equations (4) there are only n independent 
equations. ‘The validity of the method was proved on the assumption 
that the functions (4) have continuous partial derivatives of second 
order. 

The first step in Neyman’s procedure consists in eliminating the 
parameters 6 among some n of the equations (4). For example, this 
may be achieved by selecting a group of n—=s of these equations, by 
solving them with respect to the 6’s and by substituting the solutions 
into the remaining s equations. Then the system of s equations so 
obtained may be simplified. However, in many cases an alternative 
procedure of elimination appears to be preferable. Whichever method 
is used, the result is represented by a system of s equations of the form, 
say 
(6) = Ft (pos * Pn) = 0 for t= 1,2,---,s. 


In some cases, equations (6) are linear in the p’s. Then they are 
used in the next step in the process of estimation. Ordinarily, however, 
equations (6) are not linear and cannot be replaced by an equivalent 
linear system. In these cases we replace equations (6) by their 
“reduced ” form. Denote by F;(q) the result of the substitution 


Pr=™ Tk k=0,1,2,---,n 


into F;(p) and by a the result of the same substitution into the 
expression of the derivative of /';(p) with respect to px, 


| 
s, 
C- 
: 
st 
e 
* 
e 
t 
n 
| 
a 
| 
| 
| 
| 


262 CHIN LONG CHIANG 


Then write 
(7) F*,(p,q) =F:(q) + 2 dee = 0, t—1,2,---,s. 


Formula (7) represents the “ reduced ” form of equation (6). Obviously 
F*,(p, q) represents the sum of the first two terms of the Taylor expan- 
sion of F;(p) about the point py for k —0,1,2,---,n. 

The next step in the estimation procedure consists in writing down 
the “ modified ” y’, 


vg=N> (Px — 
k=0 qk 
and in determining the values of the p, which satisfy the obvious 


n 
condition ¥ py = 1 and minimize x? under the side conditions (6). If 
k=0 


this is not feasible, then the minimization of ,* is made under the 
“ reduced ” conditions (7). This latter problem is easy and the mini- 
mizing values of the p’s are given by the formula, 


t=1 r=1 
where 


n 
k=0 


n 
Str = — 
k=0 


Sis Sis 
A | So, Soe So, . 

| 

| Sa Ss2 Sys | 


and A, is the cofactor of S,, in the determinant A. The minimum ,*, 
of the modified y? is then given by the formula 
N 
F-(q)- 
t=1 r=1 
When the values p, of (8) are obtained, the next step consists in 
writing n—s equations of the form 


(9) fu (G,, Bo, - On») = pr 


DESIGN OF MASS MEDICAL SURVEYS 263 


and in solving them with respect to 6,,62,° - -,6n-s. The solutions are 
the BAN estimates of the n —s parameters 6: as N is increased, they 
are asymptotically normal random variables, each tends in probability 
to the true value of the corresponding @ and, of all the estimates with 
these properties, they have minimum asymptotic variances. Unfor- 
tunately, thus far, little is known about the speed with which these 
properties are approached by the actual distribution of 6 as N is increased. 
Using a sampling experiment the degree of approximation provided by 
the asymptotic distribution was investigated only with respect to Model 
I. It was found that with NV — 1000 this approximation is satisfactory. 

Once the BAN estimates of all the parameters are obtained, they can 
be substituted into the expression of @(p) to provide an estimate of 
the distribution of p in the population studied. Then, in turn, the esti- 
mated distribution G(p) yields the estimates of the p, as in formula (2). 

In using the above method, one must realize that it involves several 
arbitrary steps which will influence the final result. Thus, for example, 
the system of the side conditions (6) may be replaced by any other 
system provided it is equivalent to the original one. If the minimization 
of x? is done under the system (6) of unreduced side conditions, the 
result of the minimization is not influenced by passage from one system 
of side conditions to a different equivalent one. However, if the mini- 
mization is made using the reduced conditions (7), then the final out- 
come depends on the nature of the selected system of conditions (6). 

Another element of arbitrariness is involved in the choice of the 
particular n—s equations (9) from which to compute the estimates 
of the parameters. This choice also influences the final results. 

The problem of which choices in the above have some property of 
“bestness ” is not yet clear and requires further investigation. How- 
ever, the following points must be borne in mind: (i) The choice in the 
above steps must be made on other grounds than the outcomes of the 
numerical computations based on a given sample. When this condition 
is satisfied then the estimates of the parameters obtained by the method 
described possess the asymptotic properties mentioned, independently of 
the particular choices made; (ii) In selecting the system of side con- 
ditions (6) to be “ reduced,” one should consider the relative “ degree 
of non-linearity,” the closer the system (6) to a system of linear equa- 
tions, the better the prospect of precise estimates obtainable using the 
reduced system (7). 


4 
| 
| 
} 
) 


264 CHIN LONG CHIANG 


After these preliminaries we give particular formulae relating to 
each of the five models considered. 
(C) Mathematical forms of the five models 


Model I. Here the number of parameters involved is four. The dis- 
tribution G(¢) is defined as follows: 


0 if p< Pi> 
if pi =p < pr, 

G(p) m+n if pSp<l, 
1 if p=l1. 


Further, using (2) we have 
Pr = (1— pi)"* + mopo*(1 — p2)"*}, 
for k = 0,1,2,---,n—1, 
Pn = 1 — — pi") — — po"). 
The general form of the side conditions (6) is the following: | 
(Pp) = — — URUK — UW = 0, | 


for k=0,1,-- -,n—5, where u, p;,/C,*. The reduced side con- 
ditions (7) are easily obtained. 

Model II. The distribution @G(p) depends upon three parameters and 
is defined as follows: 


0 if p< pi; 
G(p) on ifm 
1 if p= 1. 


Hence 


1 
(10) pe— + — py 
py 


for k 0,1,2,---,n—1, 


To 


Pa = 1—7,(1 — p.")— (n + — p:) — pi(1 — p.")}. 


Since all the p’s must add up to unity, only the first n equations need 
to be used for elimination of the parameters 7,, 7. and p;. Divide the 
(k + 1)-st equations of (10) by C,*, and denote p,/C,* by uz. We have 


DESIGN OF MASS MEDICAL SURVEYS 265 


1 
— 
k=0,1,---,n—1. 
Let m be an integer number, 0 = m=n—1. Consider the equations 
(11) for k= 0,1,2,---,m, multiply the k-th equation by Cm* and 
then add the m + 1 equations so obtained. The result is 


or, after effecting the integration and multiplying by n —_m + 1, 


(12) (n—m + 
= (n—m-+1)Vm—=Wm (say), 
for m= 0,1,2,---,n—1. 

Arrange the n equations (12) in groups of three, the first group 
corresponding to m = 0,1, 2, the second to m = 1, 2,3, ete. The last 
group will be composed of equations (12) with m = n—3,n—2 and 
n—1. There will be n —2 such groups of equations and each of them 
may be considered as a linear system of non-homogeneous equations with 
respect to +, and 72 to be satisfied by the same values of these parameters. 
The condition for the consistency of the three equations is represented 
by the determinantal equation 


(n—m-+1)(1—p:)"™™ (1—p)”™™ Wr 
(n—m)(1—pi)™™* (1—pi)™™* Was | = 0 
(n—m—1)(1— 


which easily reduces to, say 
(13) (pi) Wms2(1 — pr)? — 2 (1 — pr) + 


for m = 0,1, 2,- -,n—3. Since the W’s are linear combinations of 
the p’s with known coefficients, equations (13) depend upon only one 
unknown parameter, namely p;. There are n — 2 equations (13). Thus 
the elimination of p; must yield n—3 conditions on the W’s or, what 
is the same thing, on the p’s. In order to obtain these conditions, we 
first impose the restriction on the W’s requiring the existence of a value 
of which would satisfy the two equations ®)(p,) = 0 and #,(p,) =0. 


iy 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 


266 CHIN LONG CHIANG 


This is done most conveniently by writing 


ql Po(p:) = W.(1 pi)® —p)? + — =0 
,(p:) = W.(1 — — 2W,(1 — p1) + Wo—0 

(i — = W,(1 —pi)® — 2W.(1 —m)* + — px) 
= W.(1 — — 2W.(1— + W, =O, 


In order that this system of four equations could be satisfied by the 
the same value of p, it is necessary and sufficient that 


W, —2W, Ww, 0 
W 

(14) 0 Wel] 
W, —2W, W, 0 
0 W, —2W, W, 


Upon expanding the determinant we obtain our first side condition, say 
(15) FF, =W*,W*, —3W?,W?, + 4W%,W, + 4W,W*.— 6W.W,W.Ws 
= 0. 

The set of additional n — 4 side conditions is obtained in one stroke 
by requiring that the same value of p, satisfies a group of three consecu- 
tive equations of type (13), namely 

Pm (p:) == Wise (1 — pi)? — (1 — pi) + Wm =0 

(pi) = Wines (1 —2 Wms2(1 pi) + Wan = 0 
Wines (1 — (1 pr) + Wm = 0 


for m =0,1,2,- --,n—85. 


(16) 


Treating this group as a system of three equations linear in the unknowns 
(1—p,)* and (1—p;,), the condition of consistency is written as, say, 


Wm 

(17) = Wiss Wines = 
Wines Wines 

= mes Wines — Win — 2 Winer Wines 
= () 


for m =0,1,2,---,n—5. 
Combining (15) and (17) we obtain a system of n —3 side con- 


| 
| 


DESIGN OF MASS MEDICAL SURVEYS 267 


ditions of the form F;(p) = 0, t=1,2,---,n—3. In order to see 
that these conditions are independent, notice first that each W,, depends 
ON Po, Pi,* * *> Pm With the coefficient of p,, different from zero. Thus 
Wms: depends on While Wo, - -, Wm are independent of pm... 
Furthermore, by inspecting the functions F;, it is easy to see that Fr. 
depends on W;,; and thus on p;,3 while F2,- - -, are independent 
of piss. It follows that the n — 3 conditions F;(p) — 0 are independent. 
They are non-linear and, therefore, it is necessary to replace them by 
their reduced forms. Further steps leading to the BAN estimates of 
the parameters follow the machinery outlined in Section (B) of this 
Appendix. 

A general remark regarding the construction of side conditions is 
in order. It is desirable to construct such side conditions of the form 
F.(p) =0 that (i) every system of probabilities determined by the 
formulae (2) satisfies these conditions, and (ii) that whenever a 
system of numbers k = 0,1, 2,- -,n—1, satisfies the conditions 
F,(p) = 0, then there exists a system of values of the parameters which, 
substituted in formulae (2), yield the original values p’,. Ordinarily, 
however, formulae (2) are somewhat complicated and it is necessary 
to use various devices, such as that outlined above, to effect the elimina- 
tion of the parameters. As a result, the side conditions obtained satisfy 
the requirement (i) but not necessarily (ii). As far as the asymptotic 
properties of the estimates obtained are concerned, ordinarily, this is 
of no consequence. However, for any fixed N, a failure to satisfy (ii) 
may result in greater variability of the estimates. To see this, denote 
by S, the locus of points with coordinates po, p:,- * *, Pn+ generated 
by varying all the parameters @ within their natural limits of variation. 
The process of minimizing the modified x? under the conditions satisfying 
(i) and (ii) is equivalent to determining on S, the particular point, 
say P, which, in the sense of x*, is closest to the point, say Q, with 
coordinates qo, *,;Qn-1- The true probabilities p, determine a 
point say P* on S,. Since each of the q’s tends in probability to the 
corresponding true p, when N is large then Q is close to P* and this 
makes P close to P*. Now suppose that the side conditions satisfy the 
requirement (i) but not (ii). Then the locus of points satisfying the 
side conditions includes S, but, in addition, some other locus, say S,. 
The minimization of the x? reduces to determining the point, say P”’, 
on the locus S, + 8, which is closest to Q. If N is not large, so that 
Q is not very close to P*, it is possible that the point P’ will be found 


i) 

| 

| | 
| 
| 
| 
| | 
| 
| 
| 
| 
| 
} 
| 

| | 

| 

| | 


268 CHIN LONG CHIANG 


on S, rather than on S,. This explains the excessive variability of the 
estimates of the parameters which may be observed when the side con- 
ditions do not satisfy the requirement (ii). The determination of these 
estimates involves taking n —s of the coordinates of P’ and using the 
point, say P”, on So which has these particular n—s coordinates. 
Naturally, P’ may be quite far from P*. 

Unpleasant as these circumstances are, on occasion one is forced to 
face them. However, when N is increased, then, eventually most of 
the observable positions of Q will be so close to P* that the chance of 
P’ falling on S, rather than on S, becomes negligible. This gives an 
intuitive explanation why the neglect of requirement (ii) does not in 
general affect the asymptotic properties of the estimates. We must, 
however, point out the possibility of P* being a limiting point of points 
on S,. In this case also the asymptotic properties of the estimates will 
be affected. While intuitively clear, these points require a separate study. 


Model II(a). The distribution G(p) is defined as 


0 if p< pi; 
G(pe) if mSp<l, 
1 if p=l1, 
and p; has the following form, 
1 1 
Pr 
for k =0,1,2,---,n. 


The method of constructing the side restrictions is similar to that 
in Model II. We follow the procedure described there until the 
expression corresponding to (12) is obtained: 


(18) + (L—m) (1— = Wn 


for m=0,1,2,--:,n—1. We may regard 7, and 1—v-, as two 
unknowns and deduce the following equations, by the method in Model II. 
(19) Wms2(1 — pi)? — (1 — pr) + Wn = 0 

for m= 0,1, -,n—3. 


We notice that (19) is exactly the same as (13). Now solve for m 
from each of the last two equations in (18) and equate the two 
expressions of z,. After simplification it is found that 


(20) (1 — — 2Wa_s(1 — pi) + Was = 0. 


DESIGN OF MASS MEDICAL SURVEYS 269 
, Since W, — 1, we may combine (19) and (20) into the following form: | 
(21) (1 — pr)? — (1 — pr) + We 0 

' for m =0,1,2,- --,n—2. 


Following the procedure in Model II, we find the required (n — 2) 
side restrictions, namely, 


(22) = W?,W?, wane 3W?,W?, 4W3,W; 6W.W,W.W; 
= 0 


| W mse + Win W* mss + W? mst m+4 WaW ms2W mes ZW mss m+2"" m+3 
= 0) 
for 


Model III. G(p) is assumed to follow a beta distribution : 


— 


G(p) 
f rm (1 — r)™dr 


and 


for 0=pl, 


1 


‘om 1— 2r)™dr 
fom 


(23) Pr = C,* for k=0,1,2,---,n. 


Express the right side of each equation in (23) in terms of the gamma 
| functions, then divide both sides by C,*. It is found that | 


Urn  m+k+1 
Ux m,+n—k 


or 
(24) Me — + (n— k) ux, (k 1) ux = 0 
for k=0,1,---,n—1. 


For a unique solution of m, and m2, the determinant of the matrix 
formed with coefficients of m, and m, and the constant terms of every 
three consecutive equations of (24) must be zero. By expanding the 
determinants, the following side restrictions are then formed: 


— | 
| 

| 


270 CHIN LONG CHIANG 

for k —0,1,2,---,n—3. 

Model IV. G(p) is defined in the following table: 


2 
t 


1 
p 0 


t 


Since j and ¢ are known numbers, the p’s are linear functions of the 7’s. 
For n=5 and t = 4, 


4 

(25) fork —=0,1 5 
j=0 4 4 


By eliminating the unknown -’s from (25), the required side restric- 
tion is 
(26) F(p) = 43p, — 93p2 + 93p,; — 43p, = 0. 


Since (26) is a linear equation in p’s, no approximation is necessary. 


(D) Power of the x? test 


The machinery of computing the approximate power of the x? test 
is as follows: Let H be the hypothesis tested under conditions described 
in (A) and let H’ be a simple hypothesis, alternative to H, with respect 
to which it is desired to compute the power, say 8(H’). Then B(H’) 
is the probability, computed under the assumption that H’ is true, that 
the x? test will reject H. 

Denote by p*, the value of the probability p, as specified by H’. 
Further, let x*o(q) stand for the minimum x? computed under the side 
conditions implied by H. Thus, 


— px(q) |? 


x°0(q) = N > 
k=0 q 


where px(q) stands for the result of substituting into expression (4) 
the BAN estimates of all the parameters 6,,0.,- - -,@n-». In order to 


| 


DESIGN OF MASS MEDICAL SURVEYS 271 


obtain the value of the power 8(H’), substitute p*, for 
k=0,1,2,--°+,m, in the expression of x*)(q) to obtain, say 


k=0 
and enter the tables of Dr. Fix. Naturally the power B(H’) depends 
on the level of significance used. It depends also on the number of 
degrees of freedom in yx’. This is equal to the number s of side con- 
ditions used in minimizing the y? or, to put it differently, to the number 
of independent equations by means of which the hypothesis tested is 
expressed. The tables of Dr. Fix should be entered in accordance with 
both the level of significance and the number of degrees of freedom. 


REFERENCES 


[1] Muencu, H. The probability distribution of protection test results. Journal 
of the American Statistical Association, Vol. 31 (1936), pp. 677-689. 

[2] NeyMaN, J. Outline of statistical treatment of the problem of diagnosis. 
Public Health Reports, Vol. 62 (1947), pp. 1449-1456. 

[3] NeyMAN, J. First Course in Probability and Statistics. Henry Holt, New 
York, 1950. 

[4] Neyman, J. Contribution to the theory of the x* test. Proceedings of the 
Berkeley Symposium on Mathematical Statistics and Probability (1949), 
pp. 239-273. 

[5] BrrKevo, C. C., W. E. CHAMBERLAIN, P. S. PHetps, P. E. ScHoots, D. Zacks, 
and J. YERUSHALMY. Tuberculosis case finding. Journal of American 
Medical Association, Vol. 133 (1947), pp. 359-365. 

[6] YerusHALmy, J. Statistical problems in assessing methods of medical 
diagnosis, with special reference to X-ray techniques. Public Health 
Reports, Vol. 62 (1947), pp. 1432-1449. 

[7] Frx, Evetyn. Tables of noncentral x*. University of California Publication 
in Statistics, Vol. 1, No. 2, pp. 15-19. 

[8] Ersennart, C. Power function of K. Pearson’s x?-test. Ph. D. dissertation, 
University of London, 1937 (unpublished). 

[9] Etsennart, C, The power function of the x?-test. Bulletin of the American 
Mathematical Society, Vol. 44 (1938), p. 32. 


[10] Warp, A. Tests of statistical hypotheses concerning several parameters 
when the number of observations is large. Transactions of the American 
Mathematical Society, Vol. 54 (1943), pp. 426-482. 


| | 
| | 
| 
| 
| 
| i] 
i 
i] 
| 
) 
| | 
| | 


